Engineering Journal of Don

×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

A method for pre-selecting various data sequences based on relative deviation to form training samples in machine learning problems
- Abstract
- pdf (rus)
This study presents a method for preprocessing data sequences aimed at identifying and grouping different data files for subsequent use in training neural networks. An algorithm for file comparison based on the relative deviation of feature values is proposed, taking into account boundary cases (zero and near-zero values). The implementation includes parallel processing to improve performance and the generation of detailed reports. The method is tested on a dataset containing 10,000 files with parameters of a chemical process in a laboratory reactor. The results demonstrate the method's effectiveness in identifying stationary regions and generating balanced training sets.

Keywords: вata preprocessing, relative deviation, machine learning, parallel computing, file grouping, computational fluid dynamics, chemical reactor