Method for implementing a complex modular system for analyzing information sources
DOI:
https://doi.org/10.15276/ict.02.2025.14Keywords:
Timed Semantic Influence, cosine similarity, hostile speech, source clustering, information influence graph, automated data collectionAbstract
The article presents a method for building a modular information system for collecting, processing and analyzing text messages from open sources, in particular Telegram channels. The system combines procedures for cleaning and normalizing text with the calculation of metrics: Cosine Similarity, time-semantic influence (TSI), hostile rhetoric classification, clustering and building a graph of relationships. To ensure stability, an asynchronous processor with a message queue and a Circuit Breaker template are used, and the results are stored in MongoDB. TSI allows you to detect cross-channel influence even with low lexical similarity, while the hostile language module analyzes rhetoric at the message and source levels. The system generates automatic conclusions for each channel and supports visualization of results, which increases the speed and convenience of analytics. The developed solution has applied value for OSINT, information security and research tasks.