Accurate extraction of biomechanical quantities from multi-view video remains a challenging problem. Current markerless motion capture pipelines often rely on staged processing: extracting 2D keypoints before triangulating and fitting to a biomechanical model. This process depends on Human Pose Estimation (HPE), might accumulate errors, ignore rich photometric surface information, and might also be unable to retrieve distal rotations. In this paper, we propose a fully differentiable biomechanical-visual model that directly couples photometric appearance with underlying osteoarticular structure. We model the human surface using 2D Gaussian Splatting (2DGS), which is driven by a standard osteoarticular model through a parametric blendshape formulation. By maintaining end-to-end differentiability, our method allows for the direct optimization of joint angles using photometric loss from multiple camera views. Preliminary results are encouraging for both biomechanical quantities retrieval and 3D reconstruction, paving the way for a new paradigm for markerless biomechanical analysis in the wild.