Comparative analysis of the classification accuracy of ensemble and individual models using the example of fake speech data
DOI:
https://doi.org/10.15276/ict.02.2025.05Keywords:
Deepfake, fake speech, ensemble classification, machine learning, neural networks, stacking, hard voting, soft votingAbstract
In today's environment of rapid development of deep learning technologies, synthesized speech based on deepfakes poses significant risks to information security, including media manipulation and cybercrime. The study is devoted to a comparative analysis of the accuracy of ensemble and individual models for detecting fake speech. The goal is to improve the effectiveness of deepfake detection by developing ensemble classifiers based on stacking with prediction aggregation strategies (hard voting, soft voting, and soft voting with Gompertz fuzzy ranking). The Fake or Real dataset was used. K-nearest neighbors, support vector machines, random forest, extreme gradient boosting, logistic regression, multilayer perceptron, convolutional neural networks, and long short-term memory were used as base models. Of the 657 ensembles, the best achieved an accuracy of 0.935 and an F1-score of 0.935, which is 3.9 % higher than individual models. The results confirm the advantage of ensemble approaches in working with deepfakes.