Large-Scale 3D Pose Estimation of Professional Tennis Serves from Broadcast Video

Jason Wang, Stephen Baek, Robert Chen, Patrick Ho, Emmy Kim, Samuel Min, Jaden Shim, Vrishak Vemuri, Derek Wang, Natalie Kupperman

[paper]

Absract

Traditional biomechanical research in sports science relies on marker-based motion capture systems that, despite high precision, are constrained by laboratory settings and small sample sizes. We present an automated, large-scale framework for 3D pose estimation of professional tennis serves using publicly available broadcast video. The pipeline integrates RTMDet for player detection, RTMPose for 2D keypoint estimation, and MotionBERT for monocular 3D lifting, alongside a Dynamic Time Warping (DTW) action recognition module for serve segmentation. Critically, each extracted 3D pose sequence is linked to rich contextual metadata, including serve speed, placement, and match state, derived from automated scoreboard reading and official records, producing a unified biomechanical and performance dataset that does not exist in any current sports pose corpus. Applied to 5{,}966 serves from 109 professional players at the 2024 US Open, the framework reveals distinct biomechanical archetypes and gender-based divergences in kinetic strategies. A Random Forest classifier achieves 99.2\% accuracy in player identification and 97.3\% accuracy in gender classification from joint-angle trajectories alone, demonstrating that markerless motion capture from monocular broadcast footage captures individualized ``kinematic fingerprints’’ at scale. These results establish a scalable foundation for vision-based biomechanical analysis that bridges the gap between laboratory precision and competitive authenticity. Our code and dataset are publicly available at https://github.com/jasnwag/tennis_serve_dataset.