Schema-Align: A lightweight skeleton unifier with kinematic constraints for cross-dataset human action recognition

Authors

  • Roman V. Kovalevych Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна Автор
  • Mykhaylo V. Lobachev Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine Автор

DOI:

https://doi.org/10.15276/ict.02.2025.40

Keywords:

Machine learning, deep learning, computer vision, action recognition, pose analysis, video surveillance, data unification, transfer learning

Abstract

Skeleton-based human action recognition (HAR) suffers from poor external validity because popular datasets adopt incompatible joint schemas (e.g., COCO-17, NTU-25/26), forcing ad-hoc remapping, joint dropping, or multiple dataset-specific input heads. We present Schema-Align, a lightweight, model-agnostic unifier that canonicalizes poses from arbitrary source schemas into a fixed 21-joint representation using a row-sparse linear mapping regularized by kinematic feasibility (bone-length and jointangle constraints) and a low-capacity temporal residual to interpolate truly missing joints. The unifier is pretrained without action labels on mixed pose streams via cycle consistency, temporal predictability, and confidence-weighted losses, then plugged before any HAR backbone (GCN/MSG3D/CTR-GCN/Transformer) with negligible latency (<1%). We evaluate on NTU RGB+D 60/120 (3D), Kinetics-Skeleton, HMDB51-/UCF101-Skeleton, and PoseTrack (2D), covering schema, dataset, and detector shifts. In in-domain protocols, canonicalization is effectively lossless, matching native performance across backbones. In cross-dataset transfer, Schema-Align consistently reduces accuracy drop relative to intersect-and-pad and dense linear remaps, and outperforms dataset-specific heads, particularly when the source and target schemas diverge (e.g., COCO↔NTU). Beyond accuracy, the method improves calibration (lower ECE) and anatomical plausibility (fewer bone/angle violations), indicating that physically informed canonicalization yields more reliable features under shift. Ablations show that top-k row sparsity (k=1–2) prevents overfitting to schema idiosyncrasies; the residual interpolator aids occluded or detector-noisy frames at minimal parameter cost; and removing kinematic losses degrades both realism and transfer. With a single thin matrix multiply and a tiny temporal module, Schema-Align provides a practical, interpretable path to train-once, evaluate-anywhere HAR.

Downloads

Download data is not yet available.

Author Biographies

  • Roman V. Kovalevych, Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна

    Postgraduate Student of the Department of Artificial Intelligence and Data Analysis

  • Mykhaylo V. Lobachev, Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine

    PhD, Professor, Head of the Institute of Artificial Intelligence and Robotics

Published

2025-11-05

How to Cite

Schema-Align: A lightweight skeleton unifier with kinematic constraints for cross-dataset human action recognition. (2025). Інформатика. Культура. Техніка, 2, 266–272. https://doi.org/10.15276/ict.02.2025.40