intuition

Video Data Engineer

About the Role

You will own and advance the perception stack that turns raw egocentric capture (RGB, RGB-D, IMU, stereo, audio) into the highest-fidelity human motion data on the planet. Your output also feeds the SOP and workflow detection layer that turns raw video into structured, queryable task data.

What You'll Do

Build and improve our body and hand tracking pipelines on egocentric, multi-view footage.
Build the SOP and workflow detection layer on top of raw video — automatically segment tasks, detect steps, and extract structured workflow metadata.
Push the state of the art on 3D human pose and shape under real-world capture conditions.
Fuse visual signals with IMU to produce robust 3D pose under occlusion and motion blur.
Benchmark our annotations against academic datasets and the requirements of top AI labs and frontier robotics labs.
Take research from paper to production. Your code ships into the pipeline that feeds the customers training the next generation of embodied AI.

What We're Looking For

Strong background in computer vision research or applied work. SLAM, 3D object tracking, or human and object pose tracking.
Hands-on work with video generation models or world models.
Comfortable reading a paper on Monday and having a working prototype by Friday.
High agency — you don't wait to be told what to do.
Fluent in English.
Nice to have: direct experience with body or hand tracking. Parametric models such as SMPL, SMPL-X, or MANO.
Nice to have: IMU-based motion capture or visual-inertial fusion.
Nice to have: knowledge of the world models industry, infrastructure, and the labs pushing it forward.
Nice to have: robotics experience. Imitation learning, teleop, manipulation, or anything that turns human data into robot policies.

Why Join Us

Your output IS the product the frontier robotics labs train on. Paper-to-production loop is shorter here than anywhere we have seen.