Weekly Arxiv Summary - Written by AI (July 3, 2023)
Exciting Recent Papers in Machine Learning and AI
I have scoured through 404 arxiv papers on machine learning and AI topics published last week and have selected a handful of articles that I believe will be of great interest to the community. Let’s dive into these exciting recent papers!
Cooperative Team Communication using GPT-3
In a paper by Lance Ying et al. 1, a model of a cooperative team is presented, where a principal agent communicates natural language instructions about their shared plan using GPT-3 as a likelihood function. The authors demonstrate that a third-person observer can infer the team’s goal through multi-modal Bayesian inverse planning, with results closely correlated with human judgments (R = 0.96). The study highlights the importance of verbal communication for cooperative agents and was accepted to the ICML 2023 Workshop on Theory of Mind in Communicating Agents.
ConKI: Enhancing Multimodal Sentiment Analysis
Yakun Yu et al. 2 introduced ConKI, a novel model for multimodal sentiment analysis. ConKI leverages domain-specific knowledge in conjunction with pretrained general knowledge representations to build an adapter architecture. Through a hierarchical contrastive learning procedure, the proposed approach effectively learns representations and achieves improved multimodal sentiment predictions on popular datasets. The paper was accepted by ACL Findings 2023 and outperformed prior state-of-the-art results on various performance metrics.
UTRNet: Advancements in Urdu OCR
Abdur Rahman et al. 3 presented UTRNet, a groundbreaking approach to printed Urdu text recognition using high-resolution and multi-scale semantic feature extraction. The hybrid CNN-RNN model achieved state-of-the-art performance on benchmark datasets and introduced UTRSet-Real and UTRSet-Synth datasets to improve generalization to real-world data. The authors also developed an online tool for end-to-end Urdu OCR from printed documents. This work, accepted at the 17th International Conference on Document Analysis and Recognition (ICDAR 2023), is a significant step forward for Urdu OCR technology.
SparseOptimizer: Inducing Sparsity in Language Models
Fu-Ming Guo 4 introduced SparseOptimizer, a novel deep learning optimizer that employs Moreau-Yosida regularization to induce sparsity in large language models such as BERT, ALBERT, and GPT. This optimizer can be easily applied to any large language model, reducing the model parameter count while maintaining performance. The paper also proposes a co-design strategy to accelerate inference rate, achieving a performance boost of up to 7.15x compared to non-optimized models. This work is a significant contribution towards efficient and high-performing language models.
Decision-Pretrained Transformer: Enhancing Decision-Making Abilities
Jonathan N. Lee et al. 5 introduce Decision-Pretrained Transformer (DPT), a supervised pretraining method designed to teach a transformer to make decisions in unfamiliar situations. The authors demonstrate that the model can solve various reinforcement learning problems both online and offline, and can generalize to new tasks and structures. The paper provides theoretical guarantees, showing that the model can learn faster than the algorithms used to generate the pretraining data. This opens new avenues to instill strong in-context decision-making abilities in transformers.
These selected papers showcase the ongoing advancements in machine learning and AI. From enhancing cooperative team communication to improving sentiment analysis and OCR technology, as well as optimizing language models and enhancing decision-making abilities, these works contribute to the ever-evolving field of AI. Stay tuned for further updates on these exciting developments!
References
Inferring the Goals of Communicating Agents from Actions and Instructions ↩
ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis ↩
UTRNet: High-Resolution Urdu Text Recognition In Printed Documents ↩
SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate through Compiler Co-design ↩
Supervised Pretraining Can Learn In-Context Reinforcement Learning ↩