EMMA: Extracting Multiple physical parameters from Multimodal Data

CVPR 2026

Farhat Shaikh, Ayan Banerjee, Sandeep K. S. Gupta

IMPACT Lab, School of Computing & Augmented Intelligence (SCAI), Arizona State University

Project page · Demo video


Overview

EMMA is a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, and assumptions about known initial conditions, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model.

The user supplies the parametric structure of the governing ODE; EMMA solves the inverse problem of recovering its parameters, along with any latent forcing and invariants, from multimodal observations.

Key contributions

Architecture

EMMA architecture

EMMA follows a three-step pipeline: Sense · Learn · Verify.

  1. Sense. Video, audio, and chart images are converted into time-aligned signals through modality-specific pipelines.
  2. Learn. A Liquid Time-Constant (LTC) network models the system's latent dynamics in continuous time.
  3. Verify. A differentiable ODE solver simulates the recovered parameters and checks them against the observations under a physics-informed loss.

Results

EMMA delivers accurate multi-parameter recovery across diverse physical systems. Full tables and ablations are in the paper.

System Parameters recovered EMMA error Best baseline
Pendulum (90 cm) Length L, damping τ L = 0.86 ± 0.07 m (GT 0.90) Delfys, PySINDy
Torricelli (med.) Drainage k 0.0132 ± 0.0008 (GT 0.0128) matches Delfys
Sliding block (med.) Angle α, friction μ α = 24.72°, μ = 0.205 (GT 25°, 0.20) Delfys, PySINDy
LED decay (med.) γ 0.91 ± 0.0 (GT 0.92) matches Delfys
Rover 9 params (5 with known ground truth) 8.8 % ± 1.7 % mean error first work under hidden forcing
Quadrotor 12 params (7 with known ground truth) 15.9 % ± 7.4 % mean error first work under hidden forcing
Simulation charts Lotka-Volterra, Lorenz, F8 Crusader, HIV, AID >10× lower error than PySINDy on implicit dynamics PySINDy

Compared against PAIG, NIRPI, and Delfys on the video benchmarks and PySINDy on the chart-based simulations.

Supported systems

Category Systems
Delfys benchmark Pendulum, Torricelli drainage, Sliding block, LED decay, Free fall
Real-world platforms Differential-drive rover (9 params), 6-DoF quadrotor (12 params)
Simulation charts Lotka-Volterra, Chaotic Lorenz, F8 Crusader, HIV therapy, AID (Type-1 diabetes)

Installation

Tested with Python 3.10+ on macOS and Linux.

git clone https://github.com/ImpactLabASU/EMMA-CVPR2026.git
cd EMMA-CVPR2026
python3 -m venv .venv && source .venv/bin/activate   # optional but recommended
pip install -r requirements.txt

System tools

Repository layout

Folder Purpose Entry points
Baseline/ Physics-informed EMMA pipelines (Free Fall, LED, Pendulum, Sliding Block, Torricelli) plus ablation utilities. FreeFall/free_fall.py, LED/led.py, Pendulum/run-*.py, Sliding block/sliding_block*.py, Torricelli/toricelli*.py, architecture_ablation.py, run_additional_ablations.py
Rover/ Rover perception, parameter estimation, multimodal ablations, helper shell script. run.py, rover-ablation.py, rover_multimodal_ablation.py, run_rover_ablation.sh
Drone/ Drone pipeline orchestrator (vision + audio + EMMA optimization). new_run.py
CGM/ Continuous glucose monitor chart digitizer. extract_cgm_data.py

Data

Usage

Baseline pipelines

Each baseline follows the same recipe:

  1. cd Baseline/<Experiment>/
  2. Edit the configuration block inside main():
    • video_path: path to the source video; leave empty to reuse existing data files.
    • weights_path: YOLO weights (yolo11m.pt by default).
    • pixel_to_meter (Free Fall, Torricelli, Sliding Block): set from your calibration grid.
    • output_folder: a unique run directory (e.g. run_01); the script creates output/ and data/ under it.
  3. Run python3 <script>.py.
  4. Optional: python3 <script>.py --simulation-only skips retraining and reuses the latest *_coefficients.csv and *_emma_final_model.pth (Free Fall, LED, Pendulum).
Experiment Script Key outputs
Free Fall FreeFall/free_fall.py (free_fall-m.py for the medium set) trajectory CSV, free_fall_coefficients.csv, trained model, annotated video
LED decay LED/led.py trajectory CSV, led_coefficients.csv, trained model, intensity figures
Pendulum Pendulum/run-45.py, run-90.py, run-150.py thetaData.txt, omegaData.txt, pendulum_coefficients.csv, trained model
Sliding block Sliding block/sliding_block.py (-low, -med variants) trajectory CSVs, sliding_block_coefficients.csv, trained model
Torricelli Torricelli/toricelli.py (toricelli-m.py, torricelli-sm.py) height trajectories, torricelli_coefficients.csv, trained model

PySINDy baselines. Each experiment folder has pysindy_results/pysindy.py; run it from that folder (after the main pipeline has written the EMMA-formatted CSVs) for sparse-regression baselines.

Ablations. From Baseline/: python3 architecture_ablation.py and python3 run_additional_ablations.py (require pendulum datasets under Baseline/Pendulum-EMMA/<angle>_v*/data/).

Rover

cd Rover
# set video_path and weights_path in run.py (see the CONFIGURATION SECTION)
python3 run.py

Outputs: rover_coefficients.csv, rover_EMMA_final_model.pth, plots, GIF. Ablations: python3 rover-ablation.py, python3 rover_multimodal_ablation.py, or bash run_rover_ablation.sh (edit variables first). If you already have processed data/*.txt, set video_path = "" to skip detection.

Drone

cd Drone
EMMA_RUN_ORCHESTRATOR=1 python3 new_run.py --video /path/to/DroneVideo.mp4 --weights /path/to/yolo11m.pt

Note: Full orchestration also needs an external Dronepipeline/ folder containing droneExtract.py, droneExtractAudio.py, and EMMA_drone_torch_ltc_optimized.py. These are not bundled here; without them, new_run.py falls back to the local vision-only pipeline.

CGM chart digitizer

cd CGM
python3 extract_cgm_data.py   # reads CGMData.png, writes cgm_data.txt + a visualization

Troubleshooting

Citation

@InProceedings{Shaikh_2026_CVPR,
    author    = {Shaikh, Farhat and Banerjee, Ayan and Gupta, Sandeep},
    title     = {EMMA: Extracting Multiple physical parameters from Multimodal Data},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {1716-1725}
}

Also on arXiv.