Аналіз та вибір методів виявлення ключових слів у текстах: огляд існуючих підходів і практичне застосування

Тарас Володимирович Діденко; Олексій Борисович Кунгурцев

doi:10.15276/ict.02.2025.48

Authors

Taras V. Didenko Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна Автор
Oleksii B. Kungurtsev Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine Автор

DOI:

https://doi.org/10.15276/ict.02.2025.48

Keywords:

Natural language processing, keywords, TF-IDF, RAKE, TextRank, BERT, KeyBERT, embeddings, spaCy, ConceptNet

Abstract

The article addresses the problem of automatic keyword extraction from texts, which is an important stage in natural language processing (NLP). The relevance of the topic is driven by the rapid growth of textual data, which requires systematic organization and analysis. The main approaches to keyword extraction are analyzed: classical statistical methods (TF-IDF, RAKE, TextRank), modern semantic algorithms (BERT, KeyBERT, embeddings with clustering), as well as third-party tools and APIs (ConceptNet, spaCy, HuggingFace Transformers). It is shown that statistical methods are simple to implement but are less accurate than modern models because they do not account for context and semantics. Semantic approaches provide higher-quality results, although they are more resource-intensive. Particular attention is given to practical experiments with Ukrainian texts, which were pre-translated into English to use English-language models. This approach allowed better results, as most libraries are optimized for English corpora. However, attempts at back-translation revealed issues with preserving the original meaning. Experimental studies showed that KeyBERT demonstrated the highest effectiveness among the considered methods: it combines result relevance, speed, and ease of integration, making it suitable for both scientific research and applied information systems. In conclusion, the use of KeyBERT in combination with English-language texts is justified as the optimal solution for keyword extraction tasks. Prospective directions for development are also outlined: support for multilingual corpora, adaptation to domainspecific texts, and optimization of models for large-scale data processing

Downloads

Download data is not yet available.

Author Biographies

Taras V. Didenko, Національний університет «Одеська політехніка», пр. Шевченка, 1. Одеса, 65044, Україна

Master’s Student of the Department of Software Engineering
Oleksii B. Kungurtsev, Odesa Polytechnic National University. 1, Shevchenko Ave. Odesa, 65044, Ukraine

PhD, Professor of the Department of Software Engineering

Analysis and selection of keyword extraction methods in texts: review of existing approaches and practical application

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Downloads

Published

Issue

Section

How to Cite