In the context of modern continuous integration and delivery processes, it is critically important not only to have automated tests, but also their real effectiveness, reliability and economic feasibility. In this paper, the key metrics for evaluating the quality of automated testing are systematized, with a special focus on the problem of unstable tests. New indicators have been introduced and justified: the level of unstable tests and the loss of the continuous integration pipeline, which directly reflect the costs of maintaining the test infrastructure. The limitations of traditional metrics, in particular code coverage, are analyzed in detail, and the superiority of mutation testing as a more reliable indicator of the test suite's ability to detect defects is demonstrated. Key dependencies have been identified on the demonstration data of the real continuous integration pipeline: an increase in code coverage does not guarantee an improvement in mutation testing and does not lead to an increase in the number of detected defects; a high proportion of unstable tests correlates with significant losses of machine time and a decrease in confidence in test results.; Reducing the time to detect and eliminate defects is achieved not only by increasing coverage, but also by reducing the proportion of unstable tests, improving system observability, and strengthening defect management discipline.
Keywords: quality metrics for automated testing, mutation testing, unstable tests, code coverage, empirical metric analysis, comparative analysis of testing metrics, optimization of testing processes, cost-effectiveness of automation, software quality management