Engineering Journal of Don

×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Synthetic Speech Recognition Algorithm Based on Audio Signal Entropy Calculation
- Kosmynin D.A.
- Grigoriev E.K.
- Abstract
- pdf (rus)
Modern approaches to synthetic speech recognition are in most cases based on the analysis of specific acoustic, spectral, or linguistic patterns left behind by speech synthesis algorithms. An analysis of open sources has shown that the further development of methods and algorithms for synthetic speech recognition is crucial for providing protection against emerging threats and maintaining trust in existing biometric systems.
This paper proposes an algorithm for synthetic speech detection based on the calculation of audio signal entropy. The relevance of the work is driven by the increasing number of cases involving the malicious use of synthetic speech, which is becoming almost indistinguishable from genuine human speech. The results demonstrated that the entropy of synthetic speech is significantly higher, and the algorithm is robust to data losses. The advantages of the algorithm are its interpretability and low computational complexity. Experiments were conducted on the CMU ARCTIC dataset using the XTTS v.2 model. The proposed algorithm enables making a decision on the presence of synthetic speech without the need for complex spectral analysis or machine learning methods.

Keywords: synthetic speech, spoofing, Shannon entropy, speech recognition