The new LAM model can perform tasks in Word.
- honeywellholding
- Jan 4, 2025
- 1 min read
Microsoft researchers have developed the LAM (Large Action Model), an AI capable of controlling Windows applications, including Microsoft Office. Unlike traditional language models like GPT-4, which primarily generate text, LAM can transform user requests into real actions, creating step-by-step plans. It processes input from text, voice, and images, adjusting its actions in real-time.
In a test with Word, LAM successfully completed tasks 71% of the time, outperforming GPT-4, which succeeded 63% of the time when visual information wasn't included. LAM was also faster, taking just 30 seconds per task, compared to GPT-4's 86 seconds. However, when visual data was used, GPT-4 achieved a higher accuracy rate of 75.5%.
Training involved 29,000 "task-plan" pairs, later expanded to 76,000 examples with the help of GPT-4. Despite challenges like AI errors and technical limitations, researchers believe LAM marks a significant step toward Artificial General Intelligence (AGI).
The creation of LAM follows four main stages: breaking tasks into logical steps, using advanced AI like GPT-4 to convert plans into actions, developing new solutions independently, and refining the system through reinforcement learning.