Schema-Align: A lightweight skeleton unifier with kinematic constraints for cross-dataset human action recognition
DOI:
https://doi.org/10.15276/ict.02.2025.40Keywords:
Machine learning, deep learning, computer vision, action recognition, pose analysis, video surveillance, data unification, transfer learningAbstract
Skeleton-based human action recognition (HAR) suffers from poor external validity because popular datasets adopt incompatible joint schemas (e.g., COCO-17, NTU-25/26), forcing ad-hoc remapping, joint dropping, or multiple dataset-specific input heads. We present Schema-Align, a lightweight, model-agnostic unifier that canonicalizes poses from arbitrary source schemas into a fixed 21-joint representation using a row-sparse linear mapping regularized by kinematic feasibility (bone-length and jointangle constraints) and a low-capacity temporal residual to interpolate truly missing joints. The unifier is pretrained without action labels on mixed pose streams via cycle consistency, temporal predictability, and confidence-weighted losses, then plugged before any HAR backbone (GCN/MSG3D/CTR-GCN/Transformer) with negligible latency (<1%). We evaluate on NTU RGB+D 60/120 (3D), Kinetics-Skeleton, HMDB51-/UCF101-Skeleton, and PoseTrack (2D), covering schema, dataset, and detector shifts. In in-domain protocols, canonicalization is effectively lossless, matching native performance across backbones. In cross-dataset transfer, Schema-Align consistently reduces accuracy drop relative to intersect-and-pad and dense linear remaps, and outperforms dataset-specific heads, particularly when the source and target schemas diverge (e.g., COCO↔NTU). Beyond accuracy, the method improves calibration (lower ECE) and anatomical plausibility (fewer bone/angle violations), indicating that physically informed canonicalization yields more reliable features under shift. Ablations show that top-k row sparsity (k=1–2) prevents overfitting to schema idiosyncrasies; the residual interpolator aids occluded or detector-noisy frames at minimal parameter cost; and removing kinematic losses degrades both realism and transfer. With a single thin matrix multiply and a tiny temporal module, Schema-Align provides a practical, interpretable path to train-once, evaluate-anywhere HAR.