Turn human actions into training data for physical AI
Praxis converts real-world behavior into structured, machine-readable representations. Usable for robotics and imitation learning.
{"task": "Pick the battery and place it down.","duration_sec": 3.65,"segments": [{"action": "reach","language": "Right hand reaches for the battery.","object": "battery","right": {"grasp_type": "precision","contact_state": "no_contact"},"t_start": 0.0, "t_end": 0.3},{"action": "contact","language": "Right hand touches the battery.","object": "battery","right": {"grasp_type": "precision","contact_state": "making_contact"},"t_start": 0.3, "t_end": 0.4},{"action": "grasp","language": "Right hand picks up the battery.","object": "battery","right": {"grasp_type": "pinch","contact_state": "making_contact"},"t_start": 0.4, "t_end": 0.6},{"action": "translate","language": "Right hand slowly carries the battery.","object": "battery","right": {"contact_state": "in_contact"},"t_start": 0.6, "t_end": 2.6},{"action": "release","language": "Right hand slowly places the battery.","object": "battery","right": {"contact_state": "breaking_contact"},"t_start": 2.6, "t_end": 3.5}]}
Every moment becomes structured action data.
Time-aligned sequences of actions and interactions.
Per-hand interaction signals, including grasp, contact, and timing.
The same pipeline runs across all episodes, producing consistent data ready for training.
Robots don’t fail because they can’t see.
They fail because they don’t understand how actions unfold over time.
Vision isn’t the bottleneck. Representation is.
- frames
- objects
- outcomes
- transitions
- contact
- adjustment
- failure and recovery
Praxis captures how actions unfold—not just what happens, but how it happens over time.
Without this, manipulation doesn’t transfer reliably.
Praxis turns human behavior into training-ready representations for physical AI systems.
Train manipulation policies from structured, per-hand action sequences
Use structured action sequences instead of raw video to train more stable, reliable policies.
Handle failure and recovery
Capture slips, retries, and adjustments, not just successful outcomes.
Learn from action sequences
Learn from how actions unfold over time, not just final states.
Transfer skills across environments
Generalize behaviors beyond the original recording setup.
Most datasets capture results. Praxis captures how actions unfold over time.
Per-hand structure, not just scenes
Each hand is modeled separately with role, grasp, and interaction context.
Failure is modeled, not filtered out
Captures slips, retries, and recovery with cause and outcome.
Contact is signal, not noise
Model how hands actually engage with objects: where contact happens, when it begins, and how it evolves over time.
Interaction, not just objects
Capture how objects are used: grasp points, contact patterns, and intent, not just object identity.
Fully traceable data
Every action is linked to its source—perception or human refinement.
Built for learning systems that need more than pixels and labels.
For robots that need to handle the real world.
We’re working with a small number of teams using structured action data to improve policy learning and real-world robustness.