Мультимодальні графові подання для надійного виявлення антипатернів в еволюційних кодових базах

Данило Дмитрович Курінько; Вікторія Ігорівна Кривда

doi:10.15276/ict.02.2025.45

Authors

Dmytro D. Kurinko Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна Автор
Viktoriia I. Kryvda Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine Автор

DOI:

https://doi.org/10.15276/ict.02.2025.45

Keywords:

Machine learning, software engineering, program analysis, graph models, static analysis, anti-pattern detection, software quality, open-set recognition

Abstract

This study examines whether multi-modal and multi-level representations enhance the reliability of code smell and anti-pattern detection in evolving polyglot software systems. A hybrid model is introduced that integrates four evidence channels – structural, semantic, metric, and evolutionary – within a unified Code Property Graph (CPG) combining AST, CFG, and PDG relations. Semantic information is obtained from pretrained code language models, while classical quality indicators (e.g., CK, McCabe/Halstead) are attached as node and edge attributes; version-control signals (e.g., churn, co-change, recency) are aggregated with time decay to emphasize recent activity. Learning proceeds hierarchically: a local encoder summarizes token-level idioms and induced graph slices; a component-level, relation-aware GNN captures cohesion/coupling and data/control-flow structure; and a project-level encoder propagates context on a component-interaction graph. Instance-wise channel gating is employed to weight modalities, thereby emphasizing source-specific and smell-specific evidence. To support deployment under open-world conditions, selective prediction is adopted using complementary uncertainty criteria (logit energy, predictive entropy, stochastic variance), with temperature calibration to improve probability reliability and enable abstention on unfamiliar or low-confidence cases. The empirical evaluation spans Java, Kotlin, and Scala repositories under crossproject and time-aware splits; open-set tests are formed by withholding one smell class during training. Relative to rule/metric baselines, AST-GNN, text-only, and AST+Text systems, the hybrid model yields consistent improvements without increasing FPR@95TPR. Averaged over repositories, Macro-AUPRC improves by approximately 6–7 percentage points and Macro-F1 by 3–4 points over the strongest single-view baseline, with the largest gains observed for God Class and Shotgun-Surgery–like categories. Incremental CPG updates and bounded project-level propagation maintain CI/CD-compatible latency, while hierarchical attention and channel-importance scores provide reviewer-aligned explanations. The findings indicate that smells are inherently multi-signal and context-dependent, and that hierarchical, calibrated, open-set detection offers a favorable balance between accuracy and operational safety.

Downloads

Download data is not yet available.

Author Biographies

Dmytro D. Kurinko, Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна

Postgraduate Student of the Department of Artificial Intelligence and Data Analysis
Viktoriia I. Kryvda, Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine

PhD, Associate Professor of the Department of Electricity and Energy Management

Multi-modal graph representations for robust anti-pattern detection in evolving codebases

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Downloads

Published

Issue

Section

How to Cite