This study presents a method for preprocessing data sequences aimed at identifying and grouping different data files for subsequent use in training neural networks. An algorithm for file comparison based on the relative deviation of feature values is proposed, taking into account boundary cases (zero and near-zero values). The implementation includes parallel processing to improve performance and the generation of detailed reports. The method is tested on a dataset containing 10,000 files with parameters of a chemical process in a laboratory reactor. The results demonstrate the method's effectiveness in identifying stationary regions and generating balanced training sets.
Keywords: вata preprocessing, relative deviation, machine learning, parallel computing, file grouping, computational fluid dynamics, chemical reactor