Symposium on Machine Learning across Modalities

Machine learning's progress with isolated modalities—text, vision, audio—masks a fundamental gap: intelligence emerges from integrating diverse data sources. Developing unified models that reason coherently across modalities remains one of AI's defining challenges.

Multimodal learning bridges this gap by enabling systems to jointly learn from and align information across heterogeneous data sources. This paradigm promises richer representations, stronger generalization, and more robust reasoning, with implications for science, healthcare, robotics, and human-centered AI.

This workshop brings together researchers and practitioners exploring the foundations and frontiers of multimodal intelligence. We seek to spark cross-disciplinary discussion on architectures, alignment strategies, and applications that will shape the future of AI systems that understand the world as we do.

News

RSS

TBD
We are accepting papers for oral and poster presentations. See the Call for Contributions.
TBD
Interested in reviewing? Fill out the reviewer nomination form.
—
Submission site: OpenReview. Join our Slack.

2. Call for Contributions (Topics & Scope)

Theoretical Foundations

Principles of multimodal alignment and representation learning
Information-theoretic perspectives on cross-modal fusion
Modeling inter-modal dependencies and disentanglement
Foundations of transfer, generalization, and scaling across modalities

Architectures & Algorithms

Multimodal transformers, diffusion models, and joint encoders
Cross-attention, alignment, and fusion mechanisms
Contrastive and generative approaches for multi-sensor data
Learning shared latent spaces across modalities

Applications

Vision–language and audio–language understanding
Multimodal reasoning in healthcare, robotics, and science
Cross-modal retrieval, grounding, and embodied AI
Learning with missing or weakly paired modalities

Single-Modality Research

Works focusing on a single modality (e.g., vision, language, or audio)
Advances that push individual modalities
Building blocks for multimodal integration
Inspirations for cross-modal reasoning, alignment, or transfer

Trustworthiness & Tools

Bias, fairness, and interpretability in multimodal models
Evaluation benchmarks and standardized datasets
Toolkits for multimodal foundation model training and visualization
Efficient and sustainable multimodal learning at scale

Latex Template ↗ OpenReview ↗

3. Submissions & Important Dates

Papers ≤ 9 pages (excl. refs/appendix), double‑blind, non‑archival by default.

Abstract deadline: TBD
Paper deadline: TBD
Author notification: TBD
Camera‑ready: TBD
Workshop early registration: TBD
Workshop date: 2026-02-13

4. Tentative Schedule

Time	Session
8:30–8:50	Poster setup
8:50–9:00	Opening remarks
9:00–9:50	Keynote - Ruslan Salakhutdinov (Carnegie Mellon University)
9:50–10:40	Keynote - Xia "Ben" Hua (Shanghai AI Lab)
10:40–11:00	Discussions & coffee break
11:00–11:50	Highlighted talks: Manling Li, Ruohan Gao
11:50–1:40	Poster session & Lunch Break
1:40–2:10	Keynote - Alan Yuille (Johns Hopkins University)
2:10–3:00	Highlighted talks: Kayhan Batmangehhlich, Grant Varn Horn
3:00–3:50	Keynote - Atlas Wang (UT Austin)
3:50–4:00	Discussions & coffee break
4:00–4:25	Highlighted talks: Yunzhu Li
4:25–5:15	Keynote - Danai Koutra (University of Michigan, Ann Arbor)
5:15–6:00	Panel: All keynote speakers.
6:00–6:10	Concluding remarks.
6:10–8:00	Social (Optional)