Comparative analysis of classical machine learning algorithms for phishing link detection
Abstract
Comparative analysis of classical machine learning algorithms for phishing link detection
Incoming article date: 07.01.2026The article is devoted to a comparative analysis of classical interpreted machine learning algorithms for detecting phishing URLs. The introduction substantiates the relevance of the problem, notes the evolution of threats and the lack of research evaluating not only accuracy, but also practical criteria for performance and explainability of models. The literature review systematizes modern approaches: methods of URL feature analysis, semantic text analysis, and traditional non-ML solutions, and highlights a gap in the comprehensive evaluation of algorithms. The methodology describes the stages of working with a public dataset: data preprocessing, including removing constant features and scaling, and choosing three algorithms for comparison — logistic regression, decision tree, and random forest. The results section presents comparative quality metrics (Accuracy, Precision, Recall, F1-Score), error matrix analysis, training time measurements and predictions, as well as model interpretation through the importance of features, where the key indicators of phishing are the short age of the domain and signs of obfuscation. The discussion of the results includes comparing the effectiveness of Random Forest with neural network approaches from other studies, confirming the high accuracy of ensemble methods, and formulating practical recommendations for choosing an algorithm depending on the use case (prototyping, industrial deployment). In conclusion, the practical value and interpretability of classical methods are emphasized, as well as the limitations and prospects of creating hybrid systems.
Keywords: phishing, cybersecurity, information security, machine learning, Random Forest, detection of phishing attacks