Addressing the Problem of Labeled Industrial Data Scarcity through Synthetic Generation of Point Clouds for Training Deep Neural Networks for Semantic Segmentation
Abstract
Addressing the Problem of Labeled Industrial Data Scarcity through Synthetic Generation of Point Clouds for Training Deep Neural Networks for Semantic Segmentation
Incoming article date: 03.01.2026
The paper presents a methodology for addressing the scarcity of labeled industrial data for training deep neural networks for semantic segmentation. A platform is proposed for synthetic generation of training point cloud datasets based on a minimal number of real laser-scanning samples of mechanical, electrical, and plumbing networks. The algorithm includes detecting the axes of cylindrical elements using the Random Sample Consensus method, constructing perpendicular joint planes, and applying affine transformations to create assemblies of 2–7 elements. The training set is increased from 8 real scans to more than 800 synthetic examples, which makes it possible to improve the segmentation accuracy of the PointNet++ deep hierarchical point cloud learning architecture from 72% to 89% in terms of the Intersection over Union (IoU) metric. The developed system enables automated creation of BIM models of engineering infrastructure with 90–95% accuracy with respect to design parameters.Keywords: synthetic data generation, point clouds, semantic segmentation, laser scanning, Random Sample Consensus method, shortage of labeled data, BIM modeling, engineering networks, deep learning