2022 European Conference on Computing in Construction Ixia, Rhodes, Greece July 24-26, 2022 AUTOMATIC CREATION AND ENRICHMENT OF 3D MODELS FOR PIPE SYSTEMS BY CO-REGISTRATION OF LASER-SCANNED POINT CLOUDS AND PHOTOS Yuandong Pan1,2, Florian Noichl1, Alexander Braun1, André Borrmann1,2, Ioannis Brilakis2,3 1Chair of Computational Modeling and Simulation, Technical University of Munich, Germany 2Institute for Advanced Study, Technical University of Munich, Germany 3Department of Engineering, University of Cambridge, the United Kingdom Abstract An information-rich digital model for pipe systems is valuable for facility management and maintenance. Pipe systems in existing facilities can be captured for example using laser scanning equipment or cameras, providing point clouds or images. While these two data sources can provide diverse information, it is not straightforward to register one with the other. In this paper, we propose a novel approach to automatically create and enrich geometric models for pipe systems by co-registering laser-scanned point clouds and photos. Data from two separate sources are collected to test our method. Subsequently, a photogrammetric point cloud is reconstructed to establish a mapping between all 2D images and the laser-scanned 3D point cloud. State-of-the-art computer vision methods are applied to enrich the raw 2D and 3D datasets. Finally, we use the mapping to merge the processed datasets into one combined, information-rich model. Introduction The research presented in this paper is about creating and enriching 3D models for pipe systems using laser-scanned point clouds and photos. By creating, we refer to the process of creating the geometric digital representations of pipes from captured data, including laser-scanned point cloud and photos. By enriching, we refer to adding useful information such as the fluid type and flow direction for pipes to the geometric model to get an information-rich 3D model. Information-rich digital representations of physical assets receive growing attention in Architecture, Engineering, Construction (AEC), and Facilities Management (FM) sectors as they can provide substantial value to all stakeholders. Holistic digital methods such as Building Information Modeling (BIM) promise considerable improvements for efficiency and transparency, helping profitability and sustainability goals (Borrmann et al. 2018). This is especially true for the operating phase, where recently the term of the Digital Twin has been adopted (Brilakis et al. 2019), based on the concept previously applied in the manufacturing industry (Kritzinger et al. 2018). Initially slow adaptation of digital methods in the sectors of AEC and FM is picking up speed in the industry (Talebi 2014, Pärn et al. 2017). As most building stock is already existing, the creation of useful digital models of existing structures is essential for the successful implementation of digital methods (Volk et al. 2014). Depending on the use case, the geometric representation is an important but non-essential part of a digital model; however, in the built environment, it poses a significant contribution as planning and FM activities are heavily dependent on geometric information (Wetzel & Thabet 2015, Pärn et al. 2017). To initiate a suitable basis for the implementation of digital methods for existing structures, capturing the current as-is status of the building and transferring it into a suitable digital representation is a key requirement. Driven by leaps in the development of hardware and software solutions, research has seen a variety of new attempts to automate this process and inspired industry-ready software applications (Son et al. 2015). In academia, the field of Scan-to-BIM has become an extensive field of research (Son et al. 2015, Adán et al. 2018), recently also coined as Digital Twinning (Lu & Brilakis 2019). Most of these research efforts focus on the clear definition and technical improvement of single processing steps, with mostly one method or data source at its core. In this paper, we showcase a pipeline that covers the majority of steps necessary for an end-to-end solution, from raw industry-standard input data of two different types to a useful semantically rich 3D representation. As the core component, we present a method to co-register separately recorded laser-scanned point clouds and photos. This allows us to merge complementary information that we detect in the datasets independently using state-of-the-art computer vision algorithms to leverage the full combined potential of the captured data. Background In order to create a sensible digital representation of existing structures in the built environment, current conditions need to be captured first; subsequently, models need to be reconstructed. Esfahani et al. (2019) present work to support the decision-making process with regard to the choice of capturing equipment and further processing options. On the basis of raw capture data, the manual reconstruction of useful digital models is possible but time-consuming and error-prone (Fumarola & Poelman 2011, Hullo et al. 2015). The processing steps towards a useful model can be divided into two categories: 1) Point cloud processing and enrichment and 2) model reconstruction. In the first step, it makes sense to distinguish individual objects in the point cloud or distinguish between object classes. For domain-specific applications, this has been achieved using manually selected, geometric features in the point cloud. Yokoyama et al. (2013) use principle component analysis (PCA) for detecting pole-like objects, Lu & Brilakis (2019) detect bridge cross sections after intelligent slicing, the authors of S3DIS (Armeni et al. 2016) use a ‘peak-gap-peak’pattern for separation of rooms, Czerniawski et al. (2016) detect pipe spools point clouds based on local curvature. Data-driven methods such as the artificial neural network architectures PointNet (Qi et al. 2017) and KPConv (Thomas et al. 2019) are more domain-independent, given the availability of suitable training data. The latter has shown convincing results for indoor environments (S3DIS (Armeni et al. 2016)), urban scenes (Paris-Lille-3D (Roynard et al. 2018)) and railway tunnels (Soilán et al. 2020). Specifically for the AEC domain, with Scan2BIM-Net, Perez-perez et al. (2021) introduce an approach that is based on a combination of network architectures for semantic segmentation leading to robust results for the presented indoor dataset. Agapaki & Brilakis (2020) showcase a solution for the use case of industrial scenes that puts emphasis on minimized manual effort for training data annotation. To bring the single steps together to a full toolchain that is suitable to solve the problem of Scan-to-BIM, recent works aim at combining previously established methods. For Scan-to-BIM for historic buildings, Andriasyan et al. (2020) introduce an end-to-end workflow from input point cloud to a BIM model that exists of precisely meshed objects. Croce et al. (2021) present a similar, semi-automatic approach that uses a random forest classifier to segment the point cloud into distinct structural element classes. In a method closely related to our approach, Wang et al. (2022) use a corresponding point cloud reconstructed from depth images (RGB-D) to enrich the laser scanning point cloud with semantics detected in 2D. Furthermore, in Wang et al. (2022) the enriched point cloud is further processed to automatically remodel the mechanical, electrical and plumbing structures (MEP) from a set of regular shaped and irregular shaped objects, with the method separately introduced in Wang et al. (2021). Research methodology Our proposed approach of creating and enriching a 3D digital model of pipes consists of two main steps: • Geometric reconstruction • Information enrichment Throughout the approach, two different types of raw data, photos and point clouds, are processed by various algorithms to extract diverse information. Furthermore, the proposed co-registration method is applied to locate photos in the point cloud and thus enable to map the information extracted from 2D image to 3D space. The whole process is illustrated in Figure 1 and introduced in detail in the following section. Photos taken by the Terrestrial laser scanning (TLS) equipment are combined with the independently gathered camera photos in a photogrammetric co-registration step to establish the spatial link between the datasets. Both the photos and point cloud are processed using computer vision algorithms to enrich the raw data with specific information individually. The enriched laser-scanned point cloud is used as the input for the geometric reconstruction of the pipe model. Finally, the information parsed from the photos is reprojected to the reconstructed 3D model using the mapping established through the initial co-registration step. In the following subsections, the implemented steps of the proposed method are introduced in more detail. Geometric reconstruction The most precise source of geometric information in this workflow is the laser-scanned point cloud as captured by TLS. Hence we use it as the basis for our 3D model reconstruction. To narrow down the problem space and allow for detailed reconstruction, we first enrich the raw point cloud to be able to filter and split it. 3D enrichment by semantic segmentation In this step, the input laser scanning point cloud is segmented by KPConv, more specifically the KP-FCNN architecture, a well-performing 3D deep learning architecture on large-scale point cloud segmentation. As shown in Table 1, KPConv (Thomas et al. 2019) is one of the best-performing neural networks for point cloud segmentation on the S3DIS dataset (Armeni et al. 2016), a widely used benchmark dataset for large-scale indoor environment point clouds. We trained our model on a manually labeled dataset of an industrial facility collected in a related study (Noichl et al. 2021) and made the inference on our collected dataset. The inference segmentation result of KPConv is used as input to the following steps. 3D enrichment by instance segmentation The result of semantic segmentation is the full point cloud with predicted class labels. All points belonging to one category have the same label, regardless of whether they belong to the same instances. In our case of creating a digital twin of pipes, segmentation only to semantic level is not sufficient for the further steps necessary for reconstructing pipe instances. Therefore, the semantic segmentation result needs to be further segmented to be able to identify separate instances. information enrichment geometric reconstruction TLS photos point cloud DSLR camera photos photogrammetric co-registration 2D enrichment 3D enrichment hybridly enriched semantic 3D model 3D reconstruction3D reprojection Figure 1: The proposed process of creating and enriching 3D models of pipes, through separate steps of co-registration, enrichment and reconstruction In our approach, we assume that one pipe instance can be a represented by one cylinder or several cylinders connected with elbows, as long as fluid can flow through these parts. Based on the assumption that one pipe instance is continuous and not intersected with other pipes, different pipes can be segmented by clustering the point cloud. We use the region growing algorithm (Rabbani et al. 2006) in the Point Cloud Library (Rusu & Cousins 2011) that merges the points that are close enough in terms of distance and local smoothness to a point cluster. The output of this step is the point clusters of point instances, which means points that belong to one cluster representing one corresponding pipe instance. 3D reconstruction In this step, based on the assumption that one pipe instance consists of one or multiple cylinders connected with elbows, we fit cylinders to the instance clusters by applying M-Estimator Sample Consensus (Torr & Zisserman 2000), a variant of Random Sample Consensus (RANSAC) (Fischler & Bolles 1981). This allows us to extract the parameters of cylinders in each instance cluster, which here include the cylinder axis and the radius. We further use as the radius as the nominal diameter of the pipe I. The fitting process works here directly if one pipe instance can be represented by a single cylinder. However, for those pipe runs that contain elbows, the elbow parts cannot be represented by cylinders directly. First, the radius of the elbow connecting the straight pipes is calculated as r = 1 1 2I (Parisher & Rhea 2011). Then, the according fillet start and end points are calculated in 3D. The resulting path is used to sweep a circle with the previously identified radius I and create a 3D model of the pipe using the Python scripting functionality of the open source application of FreeCAD1. Thus, we have created a geometric 3D model of pipes, which contains the fitted (cylindrical part) and estimated (elbow part) surfaces of pipes, as well as the corresponding segmented point cloud instances the reconstruction is based on. 1www.freecadweb.org, visited Dec 10 2021 Table 1: Performance comparison among different 3D deep learning architectures on selected categories of S3DIS (Armeni et al. 2016) dataset: *Qi et al. (2016), :Landrieu & Simonovsky (2018), ;Huang et al. (2018), §Li et al. (2018), ¶Thomas et al. (2019), }Zhao et al. (2021) model mIoU ceiling floor window door PointNet* 47.6 88.0 69.3 88.7 47.5 SPG: 62.1 89.9 76.4 95.1 55.3 RSNet; 56.5 92.8 92.5 78.6 51.6 Pointcnn§ 65.4 94.8 75.8 97.3 58.4 KPConv¶ 67.1 93.6 83.1 92.4 66.1 PointTr.} 70.4 94.3 84.7 97.5 66.1 Information enrichment In this step, we enrich the geometric reconstruction of the pipe system by adding semantic information. This information can be extracted from images, using the standardized labels on pipes that are used to indicate the fluid type and flow direction. However, co-registering laser scanning point cloud and RGB photos is not straightforward. We use our own method to bring the two data types together as follows. Photogrammetric co-registration The information enrichment starts with the reconstruction of the photogrammetric point cloud. Information like labels on pipes cannot be recognised in point clouds, but in images. Accordingly, images are a great source for adding this type of semantic information to the geometric pipe twin. In order to map information extracted from 2D images to the 3D point cloud, we propose to create a photogrammetric point cloud based on the images collected in the same area as the laser scanning point cloud. In the reconstruction process, the extrinsic and intrinsic camera parameter matrices are estimated. In our approach, we apply COLMAP (Schönberger et al. 2016, Schönberger & Frahm 2016), an open-source Structure-from-Motion (SfM) and Multi-View Stereo (MVS) software to reconstruct photogrammetric point clouds. The terrestrial laser scanner Leica RTC360 was used to capture the laser scanning point cloud along with RGB images to colorize the points. The input of SfM is a set of overlapping images taken from different viewpoints by the laser scanner and camera. SfM starts from feature detection through feature matching and then reconstructs the scene in 3D space, including the reconstructed intrinsic and extrinsic camera parameters of all images. The estimated camera poses, including the position and orientation of each acquired image in the reconstructed sparse photogrammetric point cloud and the according reconstructed dense point cloud are illustrated in Figure 2. The output after this step is the computed extrinsic and intrinsic camera parameters of the cameras of the laser scanner and the digital single-lens reflex (DSLR) camera we used to capture the pipes. (a) (b) Figure 2: Camera poses and point cloud reconstruction (a) Camera poses in sparse model, camera poses marked with a circle are images taken by the laser scanner, all others are taken by DSLR camera (b) Reconstructed dense point cloud, points on the front wall are removed for better visualisation Subsequently, we map the images taken by the DSLR camera to the laser scanning point cloud. We use Ic to denote the DSLR camera image set and Il to denote the whole laser scanner image set that are used to reconstruct the photogrammetric point cloud. For an image in camera image set mi P Ic, Mi ext and Mi int denote the corresponding camera extrinsic and intrinsic parameter matrices. These parameters are computed by SfM from the previous step and are in the coordinate of photogrammetric point cloud. For an image in laser scanning image set ni P Il, Ni ext and Ni int denote the corresponding camera extrinsic and intrinsic parameter matrices that are computed by SfM and referenced to the photogrammetric point cloud. Meanwhile, an image in laser scanning image set ni P Il also has the extrinsic and intrinsic parameters of the laser scanner camera, referenced to the coordinate of the laser scanning coordinate, denoted by Li ext and Li int. Therefore, the images taken by laser scanner work as a ‘bridge’to connect the photogrammetric and laser scanning point cloud. As shown in Figure 2, the marked camera poses are images taken by the laser scanner and the rest are images taken by DSLR images. The Leica RTC360 laser scanner captured images that are internally stitched and exported as all 6 orthogonal directions at each scanning position, forming the so-called cube map. More details about the laser scanner and data capturing are discussed in section . For images ni P Il, camera positions in the photogrammetric and laser scanning point cloud available . By moving their centroids in the photogrammetric and laser scanning coordinate to the origin, apppying singular value decomposition (SVD) to the matrix of the product of the two position matrices, the translation matrix and rotation matrix can be computed. In this paper, we use M to denote the transformation matrix that transforms points from laser scanning point cloud coordinates to photogrammetric point cloud coordinates. Any point p = [ x0, y0, z0 ]T in the original laser scanning point cloud S can be transformed to the coordinate of the photogrammetric point cloud by[ x1, y1, z1, d1 ]T = M´1 [x0, y0, z0, 1 ]T , (1) where [ x0, y0, z0, 1 ]T is the origin homogeneous coordinates of this point p, M´1 is the inverse matrix of M, and[ x1, y1, z1, d1 ]T are the newly calculated homogeneous coordinates of the point in the coordinates of the photogrammetric point cloud. Then normalization is applied by dividing each vector component by d1,[ x2, y2, z2, 1 ]T = 1 d1 [ x1, y1, z1, d1 ]T , (2) where [ x2, y2, z2, 1 ]T is the normalized homogeneous coordinate vector of point p in the coordinate of photogrammetric point cloud. As a next step, we map the information detected in images to the 3D space of laser scanning point cloud. The extrinsic parameter matrix of the image mi P Ic can be defined as Mi ext = [ Ri Ti 0 0 0 1 ] , where Ri is the 3 ˆ 3 rotation matrix Ri = ri11 ri12 ri13 ri21 ri22 ri23 ri31 ri32 ri33 , and Ti is the 3 ˆ 1 translation matrix Ti = ti1ti2 ti3  of the image mi. The intrinsic parameter matrix can be represented by Mi int = fx s cx 0 fy cy 0 0 1 , where fx and fy are the focal length of the camera measured in units of image pixels in the horizontal and vertical directions, cx and cy are the pixel coordinates of the principal point in the image plane. Additionally, s denotes the skew coefficient of the camera. A point in the coordinate of photogrammetric point cloud computed from Equation (2) can be then transformed into the camera coordinate of the image mi by x3 y3 z3 1  = Mi out  x2 y2 z2 1  =  ri11 ri12 ri13 ti1 ri21 ri22 ri23 ti2 ri31 ri32 ri33 ti3 0 0 0 1   x2 y2 z2 1  (3) and subsequently projected to the image plane by applyingx4 y4 z4  = Mi int = fx s cx 0 fy cy 0 0 1 x3 y3 z3  , (4) where x3, y3, z3 are coordinates in camera coordinates, and x4, y4, z4 are the perspective projected coordinates on the image coordinates. The image coordinates of the projected point in the image plane is calculated by homogeneous coordinate normalisation, [ u, v, 1 ]T = 1 z4 [ x4, y4, z4 ]T , (5) where u and v are the pixel coordinates in the horizontal and vertical direction in the image plane. After applying these equations, a point in the laser scanned point cloud (x0, y0, z0) is transformed to the pixel coordinate of image plane (u, v). Now we need to check whether this point is in the field of view of the camera by checking conditions 0 ď u ď W X 0 ď v ď H, (6) where W denotes the width and H denotes the height of the image. If a point satisfies this condition, this point is visible in the corresponding image. If useful semantic information (like a detected bounding box) is extracted from an image, we need to check further which points in 3D space are projected to this area. The pixel coordinates in an image are checked by (u, v) Ď Si, (7) where (u, v) are the pixel coordinates in image plane, Si denotes the ith detected bounding box in this image. Then we can attach the recognised texts inside the corresponding bounding box to those points that are projected to this bounding box. 2D information enrichment In this step, standardized labels on pipes are recognised and the corresponding information is extracted. An example is shown in Figure 3. Standardized labels on pipes represent information on the contained fluid (like liquid type and direction of flow) which is useful information for obtaining a rich model of the facility as required by facility managers maintaining the piping systems. In our approach, we use the open-source tool MMOCR (Kuang et al. 2021) to achieve text detection in images. In order to improve the performance of text recognition, the detected bounding boxes are first rotated to an angle where their longer sides are parallel to the horizontal axis. Then text recognition is applied to the rotated bounding box and we select the highest prediction score as the recognised text. The recognition result before and after rotating is compared in Figure 3. The recognition scores improves a lot with proper rotation, from 79.3% to 99.8%. The label text is recognised as "Vorlauf Heizung" (flow heating), "Rucklauf Heizung" (return heating), and "Vorlauf Heizung" respectively, which is consistent to the true texts on the labels with the exception of German umlauts ‘ä’and ‘ü’as the model used is pretrained in English. (a) (b) Figure 3: Detected bounding box before and after rotating (Text score before and after rotation: 0.793 and 0.998). (a) Original image and detected text boxes (b) Rotated image and detected text boxes With regard to detecting the arrow direction shown on the label, our approach starts with enlarging the detected text bounding box first. We then apply Canny edge detector (Rong et al. 2014) and Hough transform (Mukhopadhyay & Chaudhuri 2015) to detect lines and compute their intersections. Considering the fact that the head point of the arrow is close to the center line of detected label and arrow body points lie on the detected lines, we can identify the label arrow direction unambiguously. This shape for labels describing of pipe content and direction of flow is valid for all piping systems marked according to German code DIN 2403:2018-10. By following the computation in the previous section, the information contained in these labels can be mapped to the 3D space of the laser scanned point cloud. Thus we are able to map all detected information to the 3D reconstruction of the pipe system, including both the text information as the recognised arrow indicating flow direction. Result and discussion Dataset The dataset we used was captured in the basement of a building on the campus of the Technical University of Munich using a Leica RTC360 laser scanner and Canon EOS 600d camera. (a) (b) (c) Figure 4: Steps for the recognition of label arrow direction (a) Detected and (b) Enlarged text bounding box (only region of interest is masked) (c) Line detection and intersection, green: detected lines, orange, dashed: center line, red: arrow head, blue: arrow body points Results In Figure 5, we show the qualitative intermediate result of our toolchain step by step. Figure 5(a) and Figure 5(b) show the input point cloud and predicted pipe points in the system, respectively. It is obvious to see that most true pipe points are recognised as such. Different pipe instances can be segmented from all pipe points by region growing, as shown in in Figure 5(c), encoded with varying color. The centre lines of cylinders that are extracted by RANSAC are illustrated in Figure 5(d) and the corresponding reconstructed pipes are shown in Figure 5(e). In Figure 5(f), the information recognised from labels on the pipe are added to the reconstructed model, including fluid property and direction in our case. In conclusion, all pipes in our test set could be reconstructed automatically and corresponding information could be added to the model properly. Regarding the quantitative evaluation, we list the diameter of our reconstructed pipes in Table 2. In this, the ground truth model is not the diameter of pipes that are measured in the real world, but rather our manual measurement in the point cloud. Comparison is conducted between the automatically created model and those values. As we can see, the diameter deviation is small, the largest absolute and relative deviation being 0.01m, respectively 6.3%. Table 2: Quantitative precision evaluation of pipe reconstruction against ground truth measured in the point cloud Segment No. Ground truth (m) our model (m) Deviation (abs.) (m) Deviation (rel.) (%) 1 0.158 0.150 0.008 5.1 2 0.163 0.157 0.006 3.7 3 0.159 0.169 0.010 6.3 4 0.158 0.165 0.007 4.4 5 0.198 0.195 0.003 1.5 6 0.215 0.216 0.001 0.5 Contribution and limitations We describe the contributions of our work as follows: • We propose a method that can be used to co-register photos taken by camera and point clouds taken by modern laser scanner equipment automatically. In addition, we show the co-registration method provides convincing results in an automatic end-to-end process to create and enrich 3D models for pipe systems. • Our method creates a comprehensive model which contains geometric information of pipes as well as semantic information such as content type and flow direction from standardized pipe labels by extracting information from two different data sources, point clouds and photos. However, there are still following limitations: • Images taken by the laser scanner are used as a ‘bridge’ to connect the laser-scanned point cloud and camera images. For our method to work, laser scanners with RGB sensors are required to enable the fully automatic process. • The direction recognition step is applicable as presented for piping systems that are labeled in compliance to German code. For application in other countries, assumptions need to be adapted2. Conclusions In this paper, we propose an automatic method to co-register photos taken by camera and point clouds generated by laser scanner. In addition, we show the co-registration method works well as part of the presented end-to-end approach to create and enrich 3D models of pipes. As this method is fully automated, and human intervention is limited to data capture, it provides the possibility to generate and update the model frequently at a low cost. The method introduced in this paper to register 2D images by a camera to the laser-scanned point cloud also allows to register images taken by other sensors. In our future work, we aim to integrate thermal information in the process. 2e.g. for the United States according to ASME A13.1 - 2020 Acknowledgments The work presented in this paper is funded by the Institute for Advanced Study (IAS) at the Technical University of Munich. It is conducted within the scope of a project funded by Audi AG, Ingolstadt. We thank the NVIDIA Applied Research Accelerator Program for their support in providing high-performance hardware for computation. Our thanks go to the TUM chair for Engineering Geodesy for the laser scanner equipment and support. References Adán, A., Quintana, B., Prieto, S. A. & Bosché, F. (2018), ‘Scan-to-BIM for secondary’ building components’, Advanced Engineering Informatics 37(November 2017), 119–138. Agapaki, E. & Brilakis, I. (2020), ‘CLOI-NET: Class segmentation of industrial facilities’ point cloud datasets’, Advanced Engineering Informatics 45. Andriasyan, M., Moyano, J., Nieto-Julián, J. E. & Antón, D. (2020), ‘From point cloud data to Building Information Modelling: An automatic parametric workflow for heritage’, Remote Sensing 12(7). Armeni, I., Sener, O., Zamir, A. R., Jiang, H., Brilakis, I., Fischer, M. & Savarese, S. (2016), 3d semantic parsing of large-scale indoor spaces, in ‘Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition’. Borrmann, A., König, M., Koch, C. & Beetz, J. (2018), Building Information Modeling : Why ? What ? How ?, in ‘Borrmann A., König M., Koch C., Beetz J. (eds) Building Information Modeling. Springer, Cham’, Springer. Brilakis, I., Pan, Y., Borrmann, A., Mayer, H.-G., Rhein, F., Vos, C., Pettinato, E. & Wagner, S. (2019), Built environ- ment digital twining, International Workshop on Built Environment Digital Twinning presented by TUM Institute for Advanced Study and Siemens AG. Croce, V., Caroti, G., Luca, L. D., Jacquot, K., Piemonte, A. & Véron, P. (2021), ‘From the semantic point cloud to heritage-building information modeling: A semiautomatic approach exploiting machine learning’, Remote Sensing 13(3), 1–34. Czerniawski, T., Nahangi, M., Haas, C. & Walbridge, S. (2016), ‘Pipe spool recognition in cluttered point clouds using a curvature-based shape descriptor’, Automation in Construction 71(Part 2), 346–358. Esfahani, M. E., Eray, E., Chuo, S., Sharif, M. M. & Haas, C. (2019), ‘Using scan-to-BIM techniques to find optimal modeling effort; a methodology for adaptive reuse projects’, Proceedings of the 36th International Symposium on Automation and Robotics in Construction, ISARC 2019 (Isarc), 772–779. Fischler, M. A. & Bolles, R. C. (1981), ‘Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography’, Commun. ACM 24(6), 381395. Fumarola, M. & Poelman, R. (2011), Generating virtual environments of real world facilities: Discussing four different approaches, in ‘Automation in Construction’, Vol. 20, pp. 263–269. Huang, Q., Wang, W. & Neumann, U. (2018), Recurrent slice networks for 3d segmentation of point clouds, in ‘Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition’, pp. 2626–2635. Hullo, J.-F., Thibault, G., Boucheny, C., Dory, F. & Mas, A. (2015), ‘Multi-Sensor As-Built Models of Complex Industrial Architectures’, Remote Sensing 7(12), 16339–16362. Kritzinger, W., Karner, M., Traar, G., Henjes, J. & Sihn, W. (2018), ‘Digital Twin in manufacturing: A categorical literature review and classification’, IFAC-PapersOnLine 51(11), 1016–1022. Kuang, Z., Sun, H., Li, Z., Yue, X., Lin, T. H., Chen, J., Wei, H., Zhu, Y., Gao, T., Zhang, W., Chen, K., Zhang, W. & Lin, D. (2021), ‘Mmocr: A comprehensive toolbox for text detection, recognition and understanding’, arXiv preprint arXiv:2108.06543 . Landrieu, L. & Simonovsky, M. (2018), Large-scale point cloud semantic segmentation with superpoint graphs, in ‘Proceedings of the IEEE conference on computer vision and pattern recognition’, pp. 4558–4567. Li, Y., Bu, R., Sun, M., Wu, W., Di, X. & Chen, B. (2018), ‘Pointcnn: Convolution on x-transformed points’, Advances in neural information processing systems 31, 820–830. Lu, R. & Brilakis, I. (2019), ‘Digital twinning of existing reinforced concrete bridges from labelled point clusters’, Automation in Construction 105. Mukhopadhyay, P. & Chaudhuri, B. B. (2015), ‘A survey of hough transform’, Pattern Recognition 48(3), 993–1010. Noichl, F., Braun, A. & Borrmann, A. (2021), ‘"BIM-to-Scan" for Scan-to-BIM: Generating Realistic Synthetic Ground Truth Point Clouds based on Industrial 3D Models’, Proceedings of the 2021 European Conference on Computing in Construction 2, 164–172. Parisher, R. A. & Rhea, R. A. (2011), Pipe fittings, in ‘Pipe Drafting and Design’, 3 edn, Oxford, chapter 3, pp. 13–55. Pärn, E. A., Edwards, D. J. & Sing, M. C. (2017), ‘The building information modelling trajectory in facilities management: A review’. Perez-perez, Y., Golparvar-fard, M. & El-rayes, K. (2021), ‘Scan2BIM-NET : Deep Learning Method for Segmentation of Point Clouds for Scan-to-BIM’, Journal of Construction Engineering and Management 147(9), 1–14. Qi, C. R., Su, H., Mo, K. & Guibas, L. J. (2016), ‘Pointnet: Deep learning on point sets for 3d classification and segmentation’, arXiv preprint arXiv:1612.00593 . Qi, C. R., Su, H., Mo, K. & Guibas, L. J. (2017), Pointnet: Deep learning on point sets for 3d classification and segmentation, in ‘Proceedings of the IEEE conference on computer vision and pattern recognition’, pp. 652–660. Rabbani, T., Van Den Heuvel, F. & Vosselmann, G. (2006), ‘Segmentation of point clouds using smoothness constraint’, International archives of photogrammetry, remote sensing and spatial information sciences 36(5), 248–253. Rong, W., Li, Z., Zhang, W. & Sun, L. (2014), An improved canny edge detection algorithm, in ‘2014 IEEE international conference on mechatronics and automation’, IEEE, pp. 577–582. Roynard, X., Deschaud, J. E. & Goulette, F. (2018), ‘Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification’, International Journal of Robotics Research 37(6), 545–557. Rusu, R. B. & Cousins, S. (2011), 3D is here: Point Cloud Library (PCL), in ‘IEEE International Conference on Robotics and Automation (ICRA)’, IEEE, Shanghai, China. Schönberger, J. L. & Frahm, J.-M. (2016), Structure-from-motion revisited, in ‘Conference on Computer Vision and Pattern Recognition (CVPR)’. Schönberger, J. L., Zheng, E., Pollefeys, M. & Frahm, J.-M. (2016), Pixelwise view selection for unstructured multi-view stereo, in ‘European Conference on Computer Vision (ECCV)’. Soilán, M., Nóvoa, A., Sánchez-Rodríguez, A., Riveiro, B. & Arias, P. (2020), Semantic Segmentation of Point Clouds with Pointnet and Kpconv Architectures Applied to Railway Tunnels, in ‘ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences’, Vol. 5, Copernicus GmbH, pp. 281–288. Son, H., Kim, C. & Turkan, Y. (2015), ‘Scan-to-BIM-an overview of the current state of the art and a look ahead’, 32nd International Symposium on Automation and Robotics in Construction and Mining: Connected to the Future, Proceedings . Talebi, S. (2014), ‘Exploring advantages and challenges of adaptation and implementation of BIM in project life cycle - University of Salford Institutional Repository’, 2nd BIM International Conference on Challenges to Overcome . Thomas, H., Qi, C. R., Deschaud, J.-E., Marcotegui, B., Goulette, F. & Guibas, L. J. (2019), Kpconv: Flexible and deformable convolution for point clouds, in ‘Proceedings of the IEEE/CVF International Conference on Computer Vision’, pp. 6411–6420. Torr, P. H. & Zisserman, A. (2000), ‘Mlesac: A new robust estimator with application to estimating image geometry’, Computer vision and image understanding 78(1), 138–156. Volk, R., Stengel, J. & Schultmann, F. (2014), ‘Building Information Modeling (BIM) for existing buildings - Literature review and future needs’, Automation in Construction 38(October 2017), 109–127. Wang, B., Wang, Q., Cheng, J. C., Song, C. & Yin, C. (2022), ‘Vision-assisted BIM reconstruction from 3D LiDAR point clouds for MEP scenes’, Automation in Construction 133(August 2021), 103997. Wang, B., Yin, C., Luo, H., Cheng, J. C. & Wang, Q. (2021), ‘Fully automated generation of parametric BIM for MEP scenes based on terrestrial laser scanning data’, Automation in Construction 125, 103615. Wetzel, E. M. & Thabet, W. Y. (2015), ‘The use of a BIM-based framework to support safe facility management processes’, Automation in Construction 60, 12–24. Yokoyama, H., Date, H., Kanai, S. & Takeda, H. (2013), ‘Detection and classification of pole-like objects from mobile laser scanning data of urban environments’, Int. J. CAD/CAM 13, 31–40. Zhao, H., Jiang, L., Jia, J., Torr, P. H. & Koltun, V. (2021), Point transformer, in ‘Proceedings of the IEEE/CVF International Conference on Computer Vision’, pp. 16259–16268. (a) (b) (c) (d) (e) (f) Figure 5: Overview of the process: (a) Laser scanning point cloud (b) points with predicted ‘pipe’class (c) pipe points clustered to separate instances (d) RANSAC- and projection results for the pipe axes (e) 3D reconstruction with elbows using a sweep (f) 3D model enriched with label information