Методика предварительного отбора различных последовательностей данных на основе относительного отклонения для формирования обучающих выборок в задачах машинного обучения

Дудалова Е.А.; Соловьева О.В.; Соловьев С.А.

A method for pre-selecting various data sequences based on relative deviation to form training samples in machine learning problems

Dudalova E.A., Soloveva O.V., Solovev S.A.

Incoming article date: 18.11.2025

This study presents a method for preprocessing data sequences aimed at identifying and grouping different data files for subsequent use in training neural networks. An algorithm for file comparison based on the relative deviation of feature values is proposed, taking into account boundary cases (zero and near-zero values). The implementation includes parallel processing to improve performance and the generation of detailed reports. The method is tested on a dataset containing 10,000 files with parameters of a chemical process in a laboratory reactor. The results demonstrate the method's effectiveness in identifying stationary regions and generating balanced training sets.

Keywords: вata preprocessing, relative deviation, machine learning, parallel computing, file grouping, computational fluid dynamics, chemical reactor