Schema-Align: легкий уніфікатор скелетів із кінематичними обмеженнями для міждатасетного розпізнавання дій людини

Роман Валерійович Ковалевич; Михайло Вікторович Лобачев

doi:10.15276/ict.02.2025.40

Authors

Roman V. Kovalevych Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна Автор
Mykhaylo V. Lobachev Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine Автор

DOI:

https://doi.org/10.15276/ict.02.2025.40

Keywords:

Machine learning, deep learning, computer vision, action recognition, pose analysis, video surveillance, data unification, transfer learning

Abstract

Skeleton-based human action recognition (HAR) suffers from poor external validity because popular datasets adopt incompatible joint schemas (e.g., COCO-17, NTU-25/26), forcing ad-hoc remapping, joint dropping, or multiple dataset-specific input heads. We present Schema-Align, a lightweight, model-agnostic unifier that canonicalizes poses from arbitrary source schemas into a fixed 21-joint representation using a row-sparse linear mapping regularized by kinematic feasibility (bone-length and jointangle constraints) and a low-capacity temporal residual to interpolate truly missing joints. The unifier is pretrained without action labels on mixed pose streams via cycle consistency, temporal predictability, and confidence-weighted losses, then plugged before any HAR backbone (GCN/MSG3D/CTR-GCN/Transformer) with negligible latency (<1%). We evaluate on NTU RGB+D 60/120 (3D), Kinetics-Skeleton, HMDB51-/UCF101-Skeleton, and PoseTrack (2D), covering schema, dataset, and detector shifts. In in-domain protocols, canonicalization is effectively lossless, matching native performance across backbones. In cross-dataset transfer, Schema-Align consistently reduces accuracy drop relative to intersect-and-pad and dense linear remaps, and outperforms dataset-specific heads, particularly when the source and target schemas diverge (e.g., COCO↔NTU). Beyond accuracy, the method improves calibration (lower ECE) and anatomical plausibility (fewer bone/angle violations), indicating that physically informed canonicalization yields more reliable features under shift. Ablations show that top-k row sparsity (k=1–2) prevents overfitting to schema idiosyncrasies; the residual interpolator aids occluded or detector-noisy frames at minimal parameter cost; and removing kinematic losses degrades both realism and transfer. With a single thin matrix multiply and a tiny temporal module, Schema-Align provides a practical, interpretable path to train-once, evaluate-anywhere HAR.

Downloads

Download data is not yet available.

Author Biographies

Roman V. Kovalevych, Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна

Postgraduate Student of the Department of Artificial Intelligence and Data Analysis
Mykhaylo V. Lobachev, Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine

PhD, Professor, Head of the Institute of Artificial Intelligence and Robotics

Schema-Align: A lightweight skeleton unifier with kinematic constraints for cross-dataset human action recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Downloads

Published

Issue

Section

How to Cite