Применение современных языковых моделей для автоматической транскрибации и анализа аудиозаписей телефонных разговоров сотрудников отдела продаж с клиентами

Буткина А.А.; Фирсова С.А.; Шеметов И.А.

Application of modern language models for automatic transcription and analysis of audio recordings of telephone conversations between sales department employees and clients

Butkina A.A., Firsova S.A., Shemetov I.A.

Incoming article date: 07.06.2025

The article is devoted to the study of the possibilities of automatic transcription and analysis of audio recordings of telephone conversations of sales department employees with clients. The relevance of the study is associated with the growth of the volume of voice data and the need for their rapid processing in organizations whose activities are closely related to the sale of their products or services to clients. Automatic processing of audio recordings will allow checking the quality of work of call center employees, identifying violations in the scripts of conversations with clients. The proposed software solution is based on the use of the Whisper model for speech recognition, the pyannote.audio library for speaker diarization, and the RapidFuzz library for organizing fuzzy search when analyzing strings. In the course of an experimental study conducted on the basis of the developed software solution, it was confirmed that the use of modern language models and algorithms allows achieving a high degree of automation of audio recordings processing and can be used as a preliminary control tool without the participation of a specialist. The results confirm the practical applicability of the approach used by the authors for solving quality control problems in sales departments or call centers.

Keywords: call center, audio file, speech recognition, transcription, speaker diarization, replica classification, audio recording processing, Whisper, pyannote.audio, RapidFuzz