intuition
Video Data Engineer
About the Role
You will own and advance the perception stack that turns raw egocentric capture (RGB, RGB-D, IMU, stereo, audio) into the highest-fidelity human motion data on the planet. Your output also feeds the SOP and workflow detection layer that turns raw video into structured, queryable task data.
What You'll Do
- Build and improve our body and hand tracking pipelines on egocentric, multi-view footage.
- Build the SOP and workflow detection layer on top of raw video — automatically segment tasks, detect steps, and extract structured workflow metadata.
- Push the state of the art on 3D human pose and shape under real-world capture conditions.
- Fuse visual signals with IMU to produce robust 3D pose under occlusion and motion blur.
- Benchmark our annotations against academic datasets and the requirements of top AI labs and frontier robotics labs.
- Take research from paper to production. Your code ships into the pipeline that feeds the customers training the next generation of embodied AI.
What We're Looking For
- Strong background in computer vision research or applied work. SLAM, 3D object tracking, or human and object pose tracking.
- Hands-on work with video generation models or world models.
- Comfortable reading a paper on Monday and having a working prototype by Friday.
- High agency — you don't wait to be told what to do.
- Fluent in English.
- Nice to have: direct experience with body or hand tracking. Parametric models such as SMPL, SMPL-X, or MANO.
- Nice to have: IMU-based motion capture or visual-inertial fusion.
- Nice to have: knowledge of the world models industry, infrastructure, and the labs pushing it forward.
- Nice to have: robotics experience. Imitation learning, teleop, manipulation, or anything that turns human data into robot policies.
Why Join Us
Your output IS the product the frontier robotics labs train on. Paper-to-production loop is shorter here than anywhere we have seen.