The article forms the task of hierarchical classification of texts, describes approaches to hierarchical classification and metrics for evaluating their work, examines in detail the local approach to hierarchical classification, describes different approaches to local hierarchical classification, conducts a series of experiments on training local hierarchical classifiers with various vectorization methods, compares the results of evaluating the work of trained classifiers.
Keywords: classification, hierarchical classification, local classification, hierarchical presicion, hierarchical recall, hierarchical F-measure, natural language processing, vectorization
The article presents the existing methods of reducing the dimensionality of data for teaching machine models of natural language. The concepts of text vectorization and word-form embedding are introduced. The task of text classification is being formed. The stages of classifier training are being formed. A classifying neural network is being designed. A series of experiments is being conducted to determine the effect of reducing the dimension of word-form embeddings on the quality of text classification. The results of evaluating the work of trained classifiers are compared.
Keywords: natural language processing, vectorization, word-form embedding, text classification, data dimensionality reduction, classifier
The article provides a brief description of the existing methods of vectorization of texts in natural language. The evaluation is described by the method of determining the similarity of words. A comparative analysis of the operation of several vectorizer models is carried out. The process of selecting data for evaluation is described. The results of evaluating the performance of the models are compared.
Keywords: natural language processing, vectorization, word-form embedding, semantic similarity, correlation