Thrilled to have Ai2’s VP of Engineering Jeremy Tryba on stage at GeekWire's Agents of Transformation event last week. He painted a vivid picture of what agentic AI can do for science, and cancer research in particular. 👇 "When you have an agent building this tree of surprising results, you can have a human oncologist wake up in the morning and say, 'Hey, if that's true, that's actually pretty interesting.' The kinds of things that potentially lead to changes in treatment for different types of cancer." Asta AutoDiscovery is already impacting how oncologists think about cancer treatment. It works by autonomously generating & testing hypotheses on your data, guided by Bayesian surprise to surface the unexpected, not the obvious. 🔬 Researchers have already run 35K+ hypotheses across social science, climate science, marine ecology, & more. 🧪 Try it: https://lnkd.in/ep6yu7kR 📺 Watch the panel: https://lnkd.in/erwremtZ
Ai2
Non-profit Organizations
Seattle, WA 62,350 followers
Breakthrough AI to solve the world's biggest problems.
About us
We are a Seattle-based non-profit AI research institute founded in 2014 by the late Paul Allen. We develop foundational AI research and innovation to deliver real-world impact through large-scale open models, data, robotics, conservation, and beyond.
- Website
-
http://allenai.org
External link for Ai2
- Industry
- Non-profit Organizations
- Company size
- 201-500 employees
- Headquarters
- Seattle, WA
- Type
- Nonprofit
- Founded
- 2014
- Specialties
- Artificial Intelligence, Deep Learning, Natural Language Processing, Computer Vision, Machine Reading, Machine Learning, Knowledge Extraction, Common Sense AI, Machine Reasoning, Information Extraction, and Language Modeling
Locations
-
Primary
Get directions
Seattle
Seattle, WA 98013, US
Employees at Ai2
Updates
-
Today we're releasing the full MolmoBot code, data, and eval suite—everything needed to train and evaluate robotic manipulation policies, with model checkpoints ranging from high-performance to lightweight. 🤖 The training data, MolmoBot-Data, is 1.7M expert manipulation trajectories spanning 11K+ unique objects, 94K+ environments, and 8 task types across two robot platforms (Franka FR3 and Rainbow Robotics RB-Y1). The open pipeline behind it, MolmoBot-Engine, handles environment sampling, domain randomization, and trajectory generation, so researchers can generate training data for their own robots and tasks. MolmoBot-Engine is now a part of MolmoSpaces, our open platform for training and evaluating robot policies. On the eval side, we've added MolmoBot to MolmoSpaces-Bench and updated the leaderboard, with a toggle to split results by inclusion of MolmoBot-Data during training. The updated MolmoBot tech report covers new benchmarks, and the new technical website includes real-world videos of every trajectory underlying our evals. 💻 Code: https://lnkd.in/gRPcGv2s 🤗 Models: https://lnkd.in/gM7Dy_3d 🤗 Data: https://lnkd.in/giEWHCEN 🔧 Data pipeline: https://lnkd.in/ehyzZYbq 📊 Leaderboard: https://lnkd.in/gyAvjD9P 📄 Tech report: https://lnkd.in/gAGp-68u 🌐 Website: https://lnkd.in/gkQZ8-Uk
-
-
Today we're releasing MolmoWeb, an open source agent that can navigate and complete tasks in a web browser on your behalf. 🖥️ Built on Molmo 2 in 4B/8B sizes, MolmoWeb sets a new open-weight SOTA across four major web-agent benchmarks and even surpasses strong agents built on proprietary models. MolmoWeb works by looking at the same screen you do. Given a task and a live webpage, it views the screenshot, decides what to do next, and takes action: clicking, typing, scrolling, switching tabs, or returning information back to you. It can handle everyday tasks like navigating websites, filling out forms, searching and filtering product listings, and finding information, all without needing specialized APIs for each site. MolmoWeb outperforms all open-weight models on every benchmark we tested, and even beats visual agents built on much larger models like GPT-4o-based SoM Agents. It also beats OpenAI CUA on 3 out of 4 benchmarks. Performance improves further when the model gets multiple attempts at a task—on both WebVoyager and Online-Mind2Web, MolmoWeb with 4 parallel attempts surpasses the best single-attempt performance of every model we evaluated, including agents powered by GPT-5 and Gemini CU Preview. We're also releasing MolmoWebMix, a dataset for training web agents with 160K+ trajectories, 30K+ human demonstrations, 7M GUI grounding examples, and 2.2M screenshot QA pairs. Everything needed to inspect, reproduce, and fine-tune MolmoWeb is openly available. 🤖 Models: https://lnkd.in/gwnpWUcX 🎮 Demo: https://molmoweb.allen.ai 📊 Data: https://lnkd.in/grGvQD4E 💻 Code: https://lnkd.in/g54rTz8d 📄 Tech report: https://lnkd.in/gVEjkuBj 📝 Blog: https://lnkd.in/g9t3Aemu
-
-
We were at #NVIDIAGTC last week! Across panels, livestreams, and expo floor demos, we shared our work on Olmo Hybrid, SERA, Asta AutoDiscovery, MolmoBot, and more, all grounded in the same idea: truly open AI means sharing the full pipeline, not just the weights. Some of the highlights: Lambda ran live fine-tuning of Olmo Hybrid at their booth, we demoed Asta AutoDiscovery at the Cirrascale Cloud Services booth, and we joined panels on open models, coding agents, and robotics, including how simulation is closing the data gap for embodied AI. 📝 Here's the full recap on our blog: https://lnkd.in/eq5GHsjG 🎥 Watch the video: https://lnkd.in/gZ55qfwi
Ai2 at NVIDIA GTC 2026
https://www.youtube.com/
-
📢 Introducing vla-evaluation-harness—a unified, fully open framework to evaluate any VLA model on any robot simulation benchmark. Today, every VLA research team maintains private eval forks per benchmark, each with its own dependencies, observation formats, and evaluation protocols. Results diverge subtly, bug fixes don't propagate, and reproducing someone else's numbers is a multi-day ordeal. vla-evaluation-harness decouples model inference from benchmark execution. Benchmarks run inside Docker for exact reproducibility. Model servers are single-file uv scripts with zero manual setup. They communicate via a WebSocket + msgpack binary protocol. A complete evaluation requires just two commands: vla-eval serve and vla-eval run. The framework currently supports 13 simulation benchmarks and 6 model servers, with community integrations expanding coverage. Parallel eval with episode sharding + batched inference makes a big difference: 2,000 LIBERO episodes drop from ~14 hours to ~18 minutes on 1× H100 (47× faster), with 16× speedups on CALVIN and 12× on SimplerEnv. We also ran a reproducibility audit of a published VLA model across three benchmarks—closely matching reported results while surfacing undocumented requirements that can quietly distort evaluation results, like ambiguous termination rules in SimplerEnv and undocumented normalization stats in CALVIN. Finally, we're releasing a VLA leaderboard aggregating 657 published results across 17 benchmarks and 509+ configurations from 1,704 papers. Open source, Apache 2.0. Built for reproducibility and new experiments. 🔗 Code: https://lnkd.in/gT8YNSrt 🏆 Leaderboard: https://lnkd.in/g-WeneJQ 📝 Paper: https://lnkd.in/gGq8rN87
-
-
It was an incredible panel at #NVIDIAGTC yesterday! Here’s Hanna Hajishirzi sharing the stage with Jensen Huang, explaining how our open model flow enables infinite customization.
-
🎯 Introducing MolmoPoint: A better way for models to point Grounding lets vision-language models do more than describe what they see. They can point to where a robot should grasp, which button to click, or which object to track across video frames. But most VLMs point by generating text coordinates—essentially dictating numbers. It works, but it wastes tokens, breaks at high resolutions, and forces models to learn an abstract numbering system that has nothing to do with how they actually perceive. MolmoPoint takes a different approach. Instead of writing coordinates, the model points by selecting from the visual tokens it's already looking at—like the difference between reading out "position 347, 582" and tapping directly on a touchscreen. It works in three steps using special grounding tokens: first, pick a rough region that contains the target, then zoom in to a smaller area using finer visual features and pinpoint the exact pixel-level location. MolmoPoint sets a new state-of-the-art on image pointing (70.7% on PointBench, 89.2 F1 on PixMo-Points), achieves the best GUI grounding among fully open models on ScreenSpot-Pro and OSWorldG, and is preferred by human evaluators 59.1% of the time on video. It's also easier to train—with just 8K examples, it outperforms coordinate-based models by ~20 F1 points, and reaches peak performance faster during full pretraining. These grounding gains don't come at a cost—question-answering, captioning, and other tasks all stay on par. We're releasing everything openly, including three models and two datasets: 🖼️ MolmoPoint-8B—general-purpose pointing across images & video 🖥️ MolmoPoint-GUI-8B—specialized for apps, websites, & software interfaces 🎥 MolmoPoint-Vid-4B—optimized for counting & tracking in video 📦 MolmoPoint-GUISyn (used to train our GUI model)—36K high-res screenshots spanning desktop, web, & mobile, with 2M+ annotated points 📦 MolmoPoint-TrackData (used to train our video model)—human-annotated & synthetic tracks with complex occlusion + motion VLMs already have visual tokens. Letting them point by selecting those tokens turns out to be simpler, faster, and better. 🤖 Models: https://lnkd.in/ganXR4YK 📦 Data: https://lnkd.in/gnYXdgaj 💻 Code: https://lnkd.in/e6twqpKa 📖 Blog: https://lnkd.in/gH7MWQTK
-
-
Wednesday at #NVIDIAGTC, and we've got a packed day. First up, Hanna Hajishirzi joins Jensen Huang + industry leaders for a conversation on where open models are headed. This is the one everyone will be talking about. 📅 12:30–2:00 PM PT 🔗 https://lnkd.in/gsArXxBs Then, open-source AI and scientific trust. Hanna Hajishirzi + Percy Liang dig into how auditable, reproducible models are changing the way research gets done, and how open science scales across labs. 📅 2:00–2:40 PM PT 🔗 https://lnkd.in/evby88-V After the panels, come see Asta AutoDiscovery in action. Bodhisattwa Majumder will be at the Cirrascale Cloud Services booth (238) demoing our AI-powered research tool that explores your datasets autonomously — generating hypotheses, running experiments, and surfacing findings you didn't know to look for. 📅 3:00–5:00 PM PT Two panels. One demo. One throughline: the best AI gets built in the open. Come find us, and see it in action. #GTC2026 #OpenSource Miss anything? We've got you covered. Full GTC breakdown: https://lnkd.in/eq5GHsjG Discord for BTS: https://discord.gg/ai2 🔬
-
"We trained our very first [Molmo] model and were surprised to find that it outperformed GPT. Scale wasn't everything in vision language — clearly there was a key role for data," Ranjay Krishna explained on today's open model panel at #NVIDIAGTC. Then "last week we released a robotics model called MolmoBot, that is trained completely in simulation, and it outperforms a lot of these foundation models that have been trained with real world data. And, again, we do release everything so that others can build on top of it.” Learn more about MolmoBot: https://lnkd.in/gns6VzbE
-
-
Tuesday at #NVIDIAGTC is here and we're kicking things off with our first big open-source panel. Ranjay Krishna joins NVIDIA's Jonathan Cohen to discuss The State of Open Source AI at 4:00PM PT 🔗 https://lnkd.in/eKNHa2nr Stop by Lambda's booth (1507) all week to watch them run supervised fine-tuning on Olmo Hybrid, with real-time observability metrics streaming on screen. A fascinating peek into how SFT works in real time. Excited to share our research, and connect with the open-source community this week at #NVIDIAGTC More: https://lnkd.in/eq5GHsjG BTS updates: https://discord.gg/ai2
-