Validity of Monocular Human Mesh Reconstruction for Estimating Lower-Extremity Joint Kinematics: A Comparison of SAM3D and CameraHMR

Ahmadreza Souri, Siddhartha Sikdar, Tiphanie E Raffegeau

[paper]

Absract

Marker-based motion capture is the gold standard for quantifying human movement but is limited by cost and laboratory requirements. This study evaluated two monocular human mesh reconstruction models, SAM3D Body and CameraHMR, for estimating lower-extremity joint kinematics relative to a marker-based motion capture reference using the OpenCap validation dataset. Smartphone videos from nine adults performing walking and sit-to-stand tasks were used to reconstruct 3D meshes, generate virtual markers, and estimate joint angles via OpenSim. Mean absolute error (MAE) during walking was similar for both models (5.98° for SAM3D and 6.09° for CameraHMR), while CameraHMR performed better during sit-to-stand (5.92° vs. 7.95°). A linear mixed-effects model showed that CameraHMR reduced error by approximately 2° during sit-tostand (p < 0.001), with minimal differences during walking. Errors were higher in the frontal and transverse planes, while knee flexion showed lower error. Overall, both methods achieved errors near 6°, indicating that single-camera mesh reconstruction combined with biomechanical modeling can approximate laboratory motion capture, although accuracy remains lower than that of multi-view systems.