×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

A method for pre-selecting various data sequences based on relative deviation to form training samples in machine learning problems

Abstract

A method for pre-selecting various data sequences based on relative deviation to form training samples in machine learning problems

Dudalova E.A., Soloveva O.V., Solovev S.A.

Incoming article date: 18.11.2025

This study presents a method for preprocessing data sequences aimed at identifying and grouping different data files for subsequent use in training neural networks. An algorithm for file comparison based on the relative deviation of feature values ​​is proposed, taking into account boundary cases (zero and near-zero values). The implementation includes parallel processing to improve performance and the generation of detailed reports. The method is tested on a dataset containing 10,000 files with parameters of a chemical process in a laboratory reactor. The results demonstrate the method's effectiveness in identifying stationary regions and generating balanced training sets.

Keywords: вata preprocessing, relative deviation, machine learning, parallel computing, file grouping, computational fluid dynamics, chemical reactor