The most capable open source AI model with visual abilities yet could see more developers, researchers, and startups develop AI agents that can carry out useful chores on your computers for you.
Released today by the Allen Institute for AI (Ai2), the Multimodal Open Language Model, or Molmo, can interpret images as well as converse through a chat interface. This means it can make sense of a computer screen, potentially helping an AI agent perform tasks such as browsing the web, navigating through file directories, and drafting documents.
“With this release, many more people can deploy a multimodal model,” says Ali Farhadi, CEO of Ai2,
→ Continue reading at WIRED