CVPR 2026
Farhat Shaikh, Ayan Banerjee, Sandeep K. S. Gupta
IMPACT Lab, School of Computing & Augmented Intelligence (SCAI), Arizona State University
EMMA is a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, and assumptions about known initial conditions, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model.
The user supplies the parametric structure of the governing ODE; EMMA solves the inverse problem of recovering its parameters, along with any latent forcing and invariants, from multimodal observations.

EMMA follows a three-step pipeline: Sense · Learn · Verify.
EMMA delivers accurate multi-parameter recovery across diverse physical systems. Full tables and ablations are in the paper.
| System | Parameters recovered | EMMA error | Best baseline |
|---|---|---|---|
| Pendulum (90 cm) | Length L, damping τ | L = 0.86 ± 0.07 m (GT 0.90) | Delfys, PySINDy |
| Torricelli (med.) | Drainage k | 0.0132 ± 0.0008 (GT 0.0128) | matches Delfys |
| Sliding block (med.) | Angle α, friction μ | α = 24.72°, μ = 0.205 (GT 25°, 0.20) | Delfys, PySINDy |
| LED decay (med.) | γ | 0.91 ± 0.0 (GT 0.92) | matches Delfys |
| Rover | 9 params (5 with known ground truth) | 8.8 % ± 1.7 % mean error | first work under hidden forcing |
| Quadrotor | 12 params (7 with known ground truth) | 15.9 % ± 7.4 % mean error | first work under hidden forcing |
| Simulation charts | Lotka-Volterra, Lorenz, F8 Crusader, HIV, AID | >10× lower error than PySINDy on implicit dynamics | PySINDy |
Compared against PAIG, NIRPI, and Delfys on the video benchmarks and PySINDy on the chart-based simulations.
| Category | Systems |
|---|---|
| Delfys benchmark | Pendulum, Torricelli drainage, Sliding block, LED decay, Free fall |
| Real-world platforms | Differential-drive rover (9 params), 6-DoF quadrotor (12 params) |
| Simulation charts | Lotka-Volterra, Chaotic Lorenz, F8 Crusader, HIV therapy, AID (Type-1 diabetes) |
Tested with Python 3.10+ on macOS and Linux.
git clone https://github.com/ImpactLabASU/EMMA-CVPR2026.git
cd EMMA-CVPR2026
python3 -m venv .venv && source .venv/bin/activate # optional but recommended
pip install -r requirements.txt
System tools
PATH (MoviePy uses it for audio extraction): brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu).yolo11m.pt): pip install ultralytics then yolo download model=yolo11m.pt, or download from the Ultralytics releases page.| Folder | Purpose | Entry points |
|---|---|---|
Baseline/ |
Physics-informed EMMA pipelines (Free Fall, LED, Pendulum, Sliding Block, Torricelli) plus ablation utilities. | FreeFall/free_fall.py, LED/led.py, Pendulum/run-*.py, Sliding block/sliding_block*.py, Torricelli/toricelli*.py, architecture_ablation.py, run_additional_ablations.py |
Rover/ |
Rover perception, parameter estimation, multimodal ablations, helper shell script. | run.py, rover-ablation.py, rover_multimodal_ablation.py, run_rover_ablation.sh |
Drone/ |
Drone pipeline orchestrator (vision + audio + EMMA optimization). | new_run.py |
CGM/ |
Continuous glucose monitor chart digitizer. | extract_cgm_data.py |
Baseline/; the scripts discover the data automatically.Rover/ and Drone/.Each baseline follows the same recipe:
cd Baseline/<Experiment>/main():video_path: path to the source video; leave empty to reuse existing data files.weights_path: YOLO weights (yolo11m.pt by default).pixel_to_meter (Free Fall, Torricelli, Sliding Block): set from your calibration grid.output_folder: a unique run directory (e.g. run_01); the script creates output/ and data/ under it.python3 <script>.py.python3 <script>.py --simulation-only skips retraining and reuses the latest *_coefficients.csv and *_emma_final_model.pth (Free Fall, LED, Pendulum).| Experiment | Script | Key outputs |
|---|---|---|
| Free Fall | FreeFall/free_fall.py (free_fall-m.py for the medium set) |
trajectory CSV, free_fall_coefficients.csv, trained model, annotated video |
| LED decay | LED/led.py |
trajectory CSV, led_coefficients.csv, trained model, intensity figures |
| Pendulum | Pendulum/run-45.py, run-90.py, run-150.py |
thetaData.txt, omegaData.txt, pendulum_coefficients.csv, trained model |
| Sliding block | Sliding block/sliding_block.py (-low, -med variants) |
trajectory CSVs, sliding_block_coefficients.csv, trained model |
| Torricelli | Torricelli/toricelli.py (toricelli-m.py, torricelli-sm.py) |
height trajectories, torricelli_coefficients.csv, trained model |
PySINDy baselines. Each experiment folder has pysindy_results/pysindy.py; run it from that folder (after the main pipeline has written the EMMA-formatted CSVs) for sparse-regression baselines.
Ablations. From Baseline/: python3 architecture_ablation.py and python3 run_additional_ablations.py (require pendulum datasets under Baseline/Pendulum-EMMA/<angle>_v*/data/).
cd Rover
# set video_path and weights_path in run.py (see the CONFIGURATION SECTION)
python3 run.py
Outputs: rover_coefficients.csv, rover_EMMA_final_model.pth, plots, GIF. Ablations: python3 rover-ablation.py, python3 rover_multimodal_ablation.py, or bash run_rover_ablation.sh (edit variables first). If you already have processed data/*.txt, set video_path = "" to skip detection.
cd Drone
EMMA_RUN_ORCHESTRATOR=1 python3 new_run.py --video /path/to/DroneVideo.mp4 --weights /path/to/yolo11m.pt
Note: Full orchestration also needs an external
Dronepipeline/folder containingdroneExtract.py,droneExtractAudio.py, andEMMA_drone_torch_ltc_optimized.py. These are not bundled here; without them,new_run.pyfalls back to the local vision-only pipeline.
cd CGM
python3 extract_cgm_data.py # reads CGMData.png, writes cgm_data.txt + a visualization
pip install -r requirements.txt in the active virtual environment. For torch/torchvision, use the PyTorch selector.yolo11m.pt and point weights_path to it.brew install ffmpeg / sudo apt install ffmpeg).@InProceedings{Shaikh_2026_CVPR,
author = {Shaikh, Farhat and Banerjee, Ayan and Gupta, Sandeep},
title = {EMMA: Extracting Multiple physical parameters from Multimodal Data},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {1716-1725}
}
Also on arXiv.