Join us in Davies Auditorium, New Haven, CT, USA, April 10th, 2026, for a deep dive into Multi-modal representation learning and foundation models.
Machine learning's progress with isolated modalities—text, vision, audio—masks a fundamental gap: intelligence emerges from integrating diverse data sources. Developing unified models that reason coherently across modalities remains one of AI's defining challenges.
Multimodal learning bridges this gap by enabling systems to jointly learn from and align information across heterogeneous data sources. This paradigm promises richer representations, stronger generalization, and more robust reasoning, with implications for science, healthcare, robotics, and human-centered AI.
This workshop brings together researchers and practitioners exploring the foundations and frontiers of multimodal intelligence. We seek to spark cross-disciplinary discussion on architectures, alignment strategies, and applications that will shape the future of AI systems that understand the world as we do.
We are accepting papers for oral and poster presentations. See the Call for Contributions.
Interested in reviewing? Fill out the reviewer nomination form.
Submission site: OpenReview. Join our Slack.
Papers ≤ 9 pages (excl. refs/appendix), double‑blind, non‑archival by default.
| Time | Session |
|---|---|
| 8:30–8:50 | Poster setup |
| 8:50–9:00 | Opening remarks |
| 9:00–9:50 | Keynote - Ruslan Salakhutdinov (Carnegie Mellon University) |
| 9:50–10:40 | Keynote - Xia "Ben" Hua (Shanghai AI Lab) |
| 10:40–11:00 | Discussions & coffee break |
| 11:00–11:50 | Highlighted talks: Manling Li, Ruohan Gao |
| 11:50–1:40 | Poster session & Lunch Break |
| 1:40–2:10 | Keynote - Alan Yuille (Johns Hopkins University) |
| 2:10–3:00 | Highlighted talks: Kayhan Batmangehhlich, Grant Varn Horn |
| 3:00–3:50 | Keynote - Atlas Wang (UT Austin) |
| 3:50–4:00 | Discussions & coffee break |
| 4:00–4:25 | Highlighted talks: Yunzhu Li |
| 4:25–5:15 | Keynote - Danai Koutra (University of Michigan, Ann Arbor) |
| 5:15–6:00 | Panel: All keynote speakers. |
| 6:00–6:10 | Concluding remarks. |
| 6:10–8:00 | Social (Optional) |
Extended organizers shown where noted.
All talks will be live‑streamed. Posters are in‑person only; we recommend attending in person if possible.
Your submission will not appear in formal proceedings or be indexed; you retain the right to publish elsewhere.
Email us: hiren.madhu@yale.edu
Proudly sponsored by the Artificial Intelligence Journal (AIJ), established in 1970.