Weekly Arxiv Summary - Written by AI (June 5, 2023)
In a recent paper, researchers from the University of Toronto introduced STEVE-1, a generative model that responds to text instructions in Minecraft. Utilizing a pretrained Video Pretraining (VPT) model and MineCLIP, the team demonstrated that their unCLIP approach is both effective and efficient for training instruction-following AI agents. Self-supervised behavioral cloning and hindsight relabeling were used to finetune VPT without the need for costly human annotations. Tests revealed that STEVE-1 surpasses previous baselines with lower-level controls, raw pixel inputs, and a wide range of text and visual instructions. The team has released the model weights, training scripts, and evaluation tools to facilitate further research in this space. 1
In a recent paper, researchers from) Shengran Hu and Jeff Clune proposed an approach to train AI agents to think like human. Their invention, Thought Cloning, is an Imitation Learning framework that goes beyond simply cloning human behaviors and instead clones the thoughts humans have while performing these behaviors. To demonstrate its capabilities, the team conducted experiments in a synthetically generated domain. Results revealed that Thought Cloning is substantially more effective than conventional Behavioral Cloning when faced with situations that are out of distribution. The researchers suggest Thought Cloning can help with AI Safety and Interpretability, making it easier to debug and improve AI agents. 2
In a recent paper, researchers from Robert J. Moss, Anthony Corso, Jef Caers and Mykel J. Kochenderfer proposed BetaZero, a belief-state planning algorithm for POMDPs. BetaZero attempts to solve high-dimensional POMDPs in practical situations by leveraging the power of online Monte Carlo tree search and offline neural network approximations of the optimal policy and value function. The team tested BetaZero on various benchmark POMDPs and a real-world geological problem of critical mineral exploration. Results showed BetaZero outperformed state-of-the-art POMDP solvers, suggesting that it can be applied across a wide range of domains. 3
In their paper “StyleDrop: Text-to-Image Generation in Any Style”, Kihyuk Sohn and his co-authors propose a method for taking a text prompt and generating an image in a specific style. This method, known as StyleDrop, utilizes a pre-trained text-to-image model and fine-tunes it with a single image to set the desired style. StyleDrop is highly versatile as it can capture nuances and details of a specific style, such as color schemes, shading, design patterns, and local and global effects. Experiments on two text-to-image models, Muse and Imagen, show that StyleDrop outperforms other methods and is even able to produce impressive results with just one image specifying the desired style. 4
ViCo is a novel approach to personalized text-to-image generation that seeks to preserve the fine visual details of the novel concept. Developed by researchers Shaozhe Hao, Kai Han, Shihao Zhao, and Kwan-Yee K. Wong, the method utilizes an attention module to condition the diffusion process on patch-wise visual semantics. A simple regularization is introduced to handle the common overfitting degradation. Tests show that ViCo achieves comparable or even better results than existing models both qualitatively and quantitatively, with only 6% of the diffusion U-Net being trained. 5