Post

Weekly Arxiv Summary - Written by AI (May 29, 2023)

LLMs hold great potential for automated planning as shown by recent research from Vishal Pallagani et al1. In their paper, the authors aim to explore the use of these Large Language Models (LLMs) for automated planning. To do this, they investigate four key questions: the extent to which LLMs can be used, which pre-training data is most successful in aiding plan generation, whether fine-tuning or prompting is more effective, and if LLMs can generalize plans. Ultimately, this research provides valuable insight into the application of LLMs for automated planning and identifies the most effective ways to utilize such models in this context.

Recent advances in language models have brought about exciting possibilities for AI automation. In a paper titled “SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning” 2, Yue Wu et al. explore the use of large language models for automated planning in the context of open-world survival games such as Crafter and Minecraft. By prompting the model with a LaTeX source and a description of the agent’s current environment, they are able to form a directed acyclic graph (DAG) of game-related questions and traverse it to directly translate answers into environment actions. Experiments reveal that the GPT-4 language model was able to outperform existing reinforcement learning (RL) algorithms with no training, and also proposes the potential of games as a test bed for large language models.

Recent research from Andy Shih et al. 3 demonstrates the efficacy of parallelizing the sampling of diffusion models, thereby reducing the time it takes to generate samples from 1000 sequential denoising steps to as low as 0.2 seconds. This is accomplished in their paper, Parallel Sampling of Diffusion Models, via a technique they call ParaDiGMS that involves guessing the solution of future denoising steps and subsequently iteratively refining until the process converges. Experiments have proven ParaDiGMS to be effective, as it is capable of improving sampling speed by 2-4x while producing samples with no measurable differences in task reward, FID score, or CLIP score. Consequently, ParaDiGMS is poised to revolutionize the field of AI and ML, enabling faster sampling with improved accuracy.

In a paper titled “Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation” 4, Lisa Dunlap et al. explore how natural language descriptions can be used with large vision models to generate useful variations of training data. By introducing ALIA (Automated Language-guided Image Augmentation) and training a model on the original dataset to filter out minimal image edits and corrupt class-relevant information, the authors are able to generate a dataset that is visually consistent with the original and offers significantly enhanced diversity. Experiments show that ALIA can outperform traditional data augmentation and text-to-image generated data by up to 15%, often even surpassing the improvements observed by adding real data. This research could have far-reaching implications for fine-grained classification tasks and datasets with limited training data.

The use of reinforcement learning (RL) to learn safety constraints has recently been proposed in a paper by David Lindner et al. 5. In the paper, titled “Learning Safety Constraints from Demonstrations with Unknown Rewards”, the authors introduce Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with potentially different reward functions. Unlike previous approaches which are limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstrations with different unknown rewards without knowledge of the environment dynamics.

The authors test CoCoRL in tabular environments and a continuous driving simulation with multiple constraints and find that it successfully learns constraints that lead to safe behavior and can be transferred to different tasks and environments. As opposed to alternative methods based on Inverse Reinforcement Learning (IRL) which often exhibit poor performance and learn unsafe policies, CoCoRL converges to the true safe set with no policy regret even for potentially sub-optimal (butsafe) demonstrations. As such, this research could provide a valuable contribution for tasks that require safety considerations.

This post is licensed under CC BY 4.0 by the author.