Adaptive video streams fragment analysis for traffic accident classification using a sparse video transformer

Authors

  • Tetiana V. Normatova Харківський національний університет радіоелектроніки, пр. Науки, 14. Харків, 61166, Україна Автор
  • Sergii V. Mashtalir Kharkiv national university of radio electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine Автор

DOI:

https://doi.org/10.15276/ict.02.2025.20

Keywords:

Video classification, neural networks, convolutional neural networks, object classification, video stream analysis, data classification, image fragment processing

Abstract

This paper proposes a simple in software implementation and effective approach to classifying short video fragments into emergency and normal scenes. For this work, a uniform sampling of frames from the entire clip (6–8 frames) is used in order not to lose key events even in long videos. Next, adaptive motion-driven frame “fragmentation” is applied: quantile thresholds are calculated using the Farnebäck optical flow map and the fragment size (8/16/32 pixels) is selected for each cell of the base grid. In areas with pronounced motion, small fragments are selected (higher detail), in static ones - larger ones (fewer calculations). The selected fragments do not overlap, are scaled to the base size and converted into feature vectors. The architecture is built on a twostage principle. In the first step, the spatial attention block works within one frame — only on selected fragments, which significantly reduces the number of feature units. In the second step, the temporal block processes the sequence of frames through their short summary representations (service classification markers, hereinafter – CLS), aggregating dynamics in time. Such a factorization “space → time” reduces the computational cost and memory without losing informativeness in moving regions. To combat class imbalance, a weighted loss function (or “loss with a focus on heavy examples”) and random sampling with weights are used during training. Optical flow maps and lists of selected fragments are previously stored on disk, which accelerates epochs on the processor without special graphics equipment. The evaluation is performed on CCD1500 (1500 emergency and 3000 normal videos) with a standard 80/20 division while preserving class fractions. The accuracy is 0.864 and the macro-F1 is 0.851; according to the preliminary comparison, the proposed approach outperforms the basic uniform frame splitting and classical schemes with simple time sampling. The main value of the approach is the combination of “motion-driven” feature unit reduction and two-stage processing, which makes the model suitable for realistic time and resource (CPU) constraints, while maintaining high sensitivity to short and local emergency events. The method is easy to scale and combine with pre-training based on masked video reconstructions. The condition fixation, open settings, and steps for full reproducibility are also described.

Downloads

Download data is not yet available.

Author Biographies

  • Tetiana V. Normatova, Харківський національний університет радіоелектроніки, пр. Науки, 14. Харків, 61166, Україна

    PhD Student of the Department of Informatics

  • Sergii V. Mashtalir, Kharkiv national university of radio electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

    Doctor of Engineering Sciences, Professor of the Department of Informatics

    Scopus Author ID: 36183980100

Published

2025-11-05

How to Cite

Adaptive video streams fragment analysis for traffic accident classification using a sparse video transformer. (2025). Інформатика. Культура. Техніка, 2, 142–148. https://doi.org/10.15276/ict.02.2025.20

Most read articles by the same author(s)