2022 European Conference on Computing in Construction
Ixia, Rhodes, Greece

July 24-26, 2022

AUTOMATIC CREATION AND ENRICHMENT OF 3D MODELS FOR PIPE SYSTEMS
BY CO-REGISTRATION OF LASER-SCANNED POINT CLOUDS AND PHOTOS

Yuandong Pan1,2, Florian Noichl1, Alexander Braun1, André Borrmann1,2, Ioannis Brilakis2,3
1Chair of Computational Modeling and Simulation, Technical University of Munich, Germany

2Institute for Advanced Study, Technical University of Munich, Germany
3Department of Engineering, University of Cambridge, the United Kingdom

Abstract
An information-rich digital model for pipe systems is valuable for facility management and maintenance. Pipe systems
in existing facilities can be captured for example using laser scanning equipment or cameras, providing point clouds or
images. While these two data sources can provide diverse information, it is not straightforward to register one with the
other. In this paper, we propose a novel approach to automatically create and enrich geometric models for pipe systems
by co-registering laser-scanned point clouds and photos. Data from two separate sources are collected to test our method.
Subsequently, a photogrammetric point cloud is reconstructed to establish a mapping between all 2D images and the
laser-scanned 3D point cloud. State-of-the-art computer vision methods are applied to enrich the raw 2D and 3D datasets.
Finally, we use the mapping to merge the processed datasets into one combined, information-rich model.

Introduction
The research presented in this paper is about creating and enriching 3D models for pipe systems using laser-scanned point
clouds and photos. By creating, we refer to the process of creating the geometric digital representations of pipes from
captured data, including laser-scanned point cloud and photos. By enriching, we refer to adding useful information such
as the fluid type and flow direction for pipes to the geometric model to get an information-rich 3D model. Information-rich
digital representations of physical assets receive growing attention in Architecture, Engineering, Construction (AEC), and
Facilities Management (FM) sectors as they can provide substantial value to all stakeholders.
Holistic digital methods such as Building Information Modeling (BIM) promise considerable improvements for efficiency
and transparency, helping profitability and sustainability goals (Borrmann et al. 2018). This is especially true for the
operating phase, where recently the term of the Digital Twin has been adopted (Brilakis et al. 2019), based on the concept
previously applied in the manufacturing industry (Kritzinger et al. 2018). Initially slow adaptation of digital methods in
the sectors of AEC and FM is picking up speed in the industry (Talebi 2014, Pärn et al. 2017). As most building stock is
already existing, the creation of useful digital models of existing structures is essential for the successful implementation
of digital methods (Volk et al. 2014).
Depending on the use case, the geometric representation is an important but non-essential part of a digital model; however,
in the built environment, it poses a significant contribution as planning and FM activities are heavily dependent on
geometric information (Wetzel & Thabet 2015, Pärn et al. 2017). To initiate a suitable basis for the implementation of
digital methods for existing structures, capturing the current as-is status of the building and transferring it into a suitable
digital representation is a key requirement.
Driven by leaps in the development of hardware and software solutions, research has seen a variety of new attempts
to automate this process and inspired industry-ready software applications (Son et al. 2015). In academia, the field of
Scan-to-BIM has become an extensive field of research (Son et al. 2015, Adán et al. 2018), recently also coined as Digital
Twinning (Lu & Brilakis 2019).
Most of these research efforts focus on the clear definition and technical improvement of single processing steps, with
mostly one method or data source at its core. In this paper, we showcase a pipeline that covers the majority of steps
necessary for an end-to-end solution, from raw industry-standard input data of two different types to a useful semantically
rich 3D representation. As the core component, we present a method to co-register separately recorded laser-scanned point
clouds and photos. This allows us to merge complementary information that we detect in the datasets independently using
state-of-the-art computer vision algorithms to leverage the full combined potential of the captured data.

Background
In order to create a sensible digital representation of existing structures in the built environment, current conditions need
to be captured first; subsequently, models need to be reconstructed. Esfahani et al. (2019) present work to support the
decision-making process with regard to the choice of capturing equipment and further processing options. On the basis
of raw capture data, the manual reconstruction of useful digital models is possible but time-consuming and error-prone
(Fumarola & Poelman 2011, Hullo et al. 2015).
The processing steps towards a useful model can be divided into two categories: 1) Point cloud processing and enrichment
and 2) model reconstruction. In the first step, it makes sense to distinguish individual objects in the point cloud or


distinguish between object classes. For domain-specific applications, this has been achieved using manually selected,
geometric features in the point cloud. Yokoyama et al. (2013) use principle component analysis (PCA) for detecting
pole-like objects, Lu & Brilakis (2019) detect bridge cross sections after intelligent slicing, the authors of S3DIS (Armeni
et al. 2016) use a ‘peak-gap-peak’pattern for separation of rooms, Czerniawski et al. (2016) detect pipe spools point clouds
based on local curvature. Data-driven methods such as the artificial neural network architectures PointNet (Qi et al. 2017)
and KPConv (Thomas et al. 2019) are more domain-independent, given the availability of suitable training data. The
latter has shown convincing results for indoor environments (S3DIS (Armeni et al. 2016)), urban scenes (Paris-Lille-3D
(Roynard et al. 2018)) and railway tunnels (Soilán et al. 2020).
Specifically for the AEC domain, with Scan2BIM-Net, Perez-perez et al. (2021) introduce an approach that is based on a
combination of network architectures for semantic segmentation leading to robust results for the presented indoor dataset.
Agapaki & Brilakis (2020) showcase a solution for the use case of industrial scenes that puts emphasis on minimized
manual effort for training data annotation.
To bring the single steps together to a full toolchain that is suitable to solve the problem of Scan-to-BIM, recent works aim
at combining previously established methods. For Scan-to-BIM for historic buildings, Andriasyan et al. (2020) introduce
an end-to-end workflow from input point cloud to a BIM model that exists of precisely meshed objects. Croce et al. (2021)
present a similar, semi-automatic approach that uses a random forest classifier to segment the point cloud into distinct
structural element classes. In a method closely related to our approach, Wang et al. (2022) use a corresponding point
cloud reconstructed from depth images (RGB-D) to enrich the laser scanning point cloud with semantics detected in 2D.
Furthermore, in Wang et al. (2022) the enriched point cloud is further processed to automatically remodel the mechanical,
electrical and plumbing structures (MEP) from a set of regular shaped and irregular shaped objects, with the method
separately introduced in Wang et al. (2021).

Research methodology
Our proposed approach of creating and enriching a 3D digital model of pipes consists of two main steps:

• Geometric reconstruction

• Information enrichment

Throughout the approach, two different types of raw data, photos and point clouds, are processed by various algorithms
to extract diverse information. Furthermore, the proposed co-registration method is applied to locate photos in the point
cloud and thus enable to map the information extracted from 2D image to 3D space. The whole process is illustrated in
Figure 1 and introduced in detail in the following section.
Photos taken by the Terrestrial laser scanning (TLS) equipment are combined with the independently gathered camera
photos in a photogrammetric co-registration step to establish the spatial link between the datasets. Both the photos and
point cloud are processed using computer vision algorithms to enrich the raw data with specific information individually.
The enriched laser-scanned point cloud is used as the input for the geometric reconstruction of the pipe model. Finally, the
information parsed from the photos is reprojected to the reconstructed 3D model using the mapping established through
the initial co-registration step.
In the following subsections, the implemented steps of the proposed method are introduced in more detail.

Geometric reconstruction
The most precise source of geometric information in this workflow is the laser-scanned point cloud as captured by TLS.
Hence we use it as the basis for our 3D model reconstruction. To narrow down the problem space and allow for detailed
reconstruction, we first enrich the raw point cloud to be able to filter and split it.

3D enrichment by semantic segmentation
In this step, the input laser scanning point cloud is segmented by KPConv, more specifically the KP-FCNN architecture,
a well-performing 3D deep learning architecture on large-scale point cloud segmentation. As shown in Table 1, KPConv
(Thomas et al. 2019) is one of the best-performing neural networks for point cloud segmentation on the S3DIS dataset
(Armeni et al. 2016), a widely used benchmark dataset for large-scale indoor environment point clouds. We trained our
model on a manually labeled dataset of an industrial facility collected in a related study (Noichl et al. 2021) and made the
inference on our collected dataset. The inference segmentation result of KPConv is used as input to the following steps.

3D enrichment by instance segmentation
The result of semantic segmentation is the full point cloud with predicted class labels. All points belonging to one category
have the same label, regardless of whether they belong to the same instances. In our case of creating a digital twin of
pipes, segmentation only to semantic level is not sufficient for the further steps necessary for reconstructing pipe instances.
Therefore, the semantic segmentation result needs to be further segmented to be able to identify separate instances.


  information    
enrichment    

    geometric 
    reconstruction

TLS

photos point cloud

DSLR camera

photos

photogrammetric co-registration 

2D enrichment 3D enrichment 

hybridly enriched
semantic 3D model

3D reconstruction3D reprojection

Figure 1: The proposed process of creating and enriching 3D models of pipes, through separate steps of co-registration, enrichment
and reconstruction

In our approach, we assume that one pipe instance can be a represented by one cylinder or several cylinders connected
with elbows, as long as fluid can flow through these parts. Based on the assumption that one pipe instance is continuous
and not intersected with other pipes, different pipes can be segmented by clustering the point cloud. We use the region
growing algorithm (Rabbani et al. 2006) in the Point Cloud Library (Rusu & Cousins 2011) that merges the points that
are close enough in terms of distance and local smoothness to a point cluster. The output of this step is the point clusters
of point instances, which means points that belong to one cluster representing one corresponding pipe instance.

3D reconstruction
In this step, based on the assumption that one pipe instance consists of one or multiple cylinders connected with elbows,
we fit cylinders to the instance clusters by applying M-Estimator Sample Consensus (Torr & Zisserman 2000), a variant
of Random Sample Consensus (RANSAC) (Fischler & Bolles 1981). This allows us to extract the parameters of cylinders
in each instance cluster, which here include the cylinder axis and the radius. We further use as the radius as the nominal
diameter of the pipe I. The fitting process works here directly if one pipe instance can be represented by a single cylinder.
However, for those pipe runs that contain elbows, the elbow parts cannot be represented by cylinders directly. First, the
radius of the elbow connecting the straight pipes is calculated as r = 1 1

2I (Parisher & Rhea 2011). Then, the according
fillet start and end points are calculated in 3D. The resulting path is used to sweep a circle with the previously identified
radius I and create a 3D model of the pipe using the Python scripting functionality of the open source application of
FreeCAD1. Thus, we have created a geometric 3D model of pipes, which contains the fitted (cylindrical part) and estimated
(elbow part) surfaces of pipes, as well as the corresponding segmented point cloud instances the reconstruction is based
on.

1www.freecadweb.org, visited Dec 10 2021


Table 1: Performance comparison among different 3D deep learning architectures on selected categories of S3DIS (Armeni et al.
2016) dataset: *Qi et al. (2016), :Landrieu & Simonovsky (2018), ;Huang et al. (2018), §Li et al. (2018), ¶Thomas et al. (2019),

}Zhao et al. (2021)

model mIoU ceiling floor window door

PointNet* 47.6 88.0 69.3 88.7 47.5

SPG: 62.1 89.9 76.4 95.1 55.3

RSNet; 56.5 92.8 92.5 78.6 51.6

Pointcnn§ 65.4 94.8 75.8 97.3 58.4

KPConv¶ 67.1 93.6 83.1 92.4 66.1

PointTr.} 70.4 94.3 84.7 97.5 66.1

Information enrichment
In this step, we enrich the geometric reconstruction of the pipe system by adding semantic information. This information
can be extracted from images, using the standardized labels on pipes that are used to indicate the fluid type and flow
direction. However, co-registering laser scanning point cloud and RGB photos is not straightforward. We use our own
method to bring the two data types together as follows.

Photogrammetric co-registration
The information enrichment starts with the reconstruction of the photogrammetric point cloud. Information like labels on
pipes cannot be recognised in point clouds, but in images. Accordingly, images are a great source for adding this type of
semantic information to the geometric pipe twin. In order to map information extracted from 2D images to the 3D point
cloud, we propose to create a photogrammetric point cloud based on the images collected in the same area as the laser
scanning point cloud.
In the reconstruction process, the extrinsic and intrinsic camera parameter matrices are estimated. In our approach, we
apply COLMAP (Schönberger et al. 2016, Schönberger & Frahm 2016), an open-source Structure-from-Motion (SfM)
and Multi-View Stereo (MVS) software to reconstruct photogrammetric point clouds. The terrestrial laser scanner Leica
RTC360 was used to capture the laser scanning point cloud along with RGB images to colorize the points. The input of
SfM is a set of overlapping images taken from different viewpoints by the laser scanner and camera. SfM starts from feature
detection through feature matching and then reconstructs the scene in 3D space, including the reconstructed intrinsic and
extrinsic camera parameters of all images. The estimated camera poses, including the position and orientation of each
acquired image in the reconstructed sparse photogrammetric point cloud and the according reconstructed dense point
cloud are illustrated in Figure 2. The output after this step is the computed extrinsic and intrinsic camera parameters of
the cameras of the laser scanner and the digital single-lens reflex (DSLR) camera we used to capture the pipes.

(a) (b)
Figure 2: Camera poses and point cloud reconstruction

(a) Camera poses in sparse model, camera poses marked with a circle are images taken by the laser scanner, all others are taken by
DSLR camera

(b) Reconstructed dense point cloud, points on the front wall are removed for better visualisation

Subsequently, we map the images taken by the DSLR camera to the laser scanning point cloud. We use Ic to denote the


DSLR camera image set and Il to denote the whole laser scanner image set that are used to reconstruct the photogrammetric
point cloud. For an image in camera image set mi P Ic, Mi

ext and Mi
int denote the corresponding camera extrinsic and

intrinsic parameter matrices. These parameters are computed by SfM from the previous step and are in the coordinate of
photogrammetric point cloud. For an image in laser scanning image set ni P Il, Ni

ext and Ni
int denote the corresponding

camera extrinsic and intrinsic parameter matrices that are computed by SfM and referenced to the photogrammetric point
cloud. Meanwhile, an image in laser scanning image set ni P Il also has the extrinsic and intrinsic parameters of the laser
scanner camera, referenced to the coordinate of the laser scanning coordinate, denoted by Li

ext and Li
int. Therefore, the

images taken by laser scanner work as a ‘bridge’to connect the photogrammetric and laser scanning point cloud. As shown
in Figure 2, the marked camera poses are images taken by the laser scanner and the rest are images taken by DSLR images.
The Leica RTC360 laser scanner captured images that are internally stitched and exported as all 6 orthogonal directions
at each scanning position, forming the so-called cube map. More details about the laser scanner and data capturing are
discussed in section .
For images ni P Il, camera positions in the photogrammetric and laser scanning point cloud available . By moving their
centroids in the photogrammetric and laser scanning coordinate to the origin, apppying singular value decomposition
(SVD) to the matrix of the product of the two position matrices, the translation matrix and rotation matrix can be
computed. In this paper, we use M to denote the transformation matrix that transforms points from laser scanning point
cloud coordinates to photogrammetric point cloud coordinates. Any point p =

[
x0, y0, z0

]T in the original laser scanning
point cloud S can be transformed to the coordinate of the photogrammetric point cloud by[

x1, y1, z1, d1
]T

= M´1 [x0, y0, z0, 1
]T

, (1)

where
[
x0, y0, z0, 1

]T is the origin homogeneous coordinates of this point p, M´1 is the inverse matrix of M, and[
x1, y1, z1, d1

]T are the newly calculated homogeneous coordinates of the point in the coordinates of the photogrammetric
point cloud.
Then normalization is applied by dividing each vector component by d1,[

x2, y2, z2, 1
]T

=
1

d1

[
x1, y1, z1, d1

]T
, (2)

where
[
x2, y2, z2, 1

]T is the normalized homogeneous coordinate vector of point p in the coordinate of photogrammetric
point cloud.
As a next step, we map the information detected in images to the 3D space of laser scanning point cloud. The extrinsic

parameter matrix of the image mi P Ic can be defined as Mi
ext =

[
Ri Ti

0 0 0 1

]
, where Ri is the 3 ˆ 3 rotation matrix

Ri =

ri11 ri12 ri13
ri21 ri22 ri23
ri31 ri32 ri33

, and Ti is the 3 ˆ 1 translation matrix Ti =

ti1ti2
ti3

 of the image mi.

The intrinsic parameter matrix can be represented by Mi
int =

fx s cx
0 fy cy
0 0 1

, where fx and fy are the focal length of the

camera measured in units of image pixels in the horizontal and vertical directions, cx and cy are the pixel coordinates of
the principal point in the image plane. Additionally, s denotes the skew coefficient of the camera.
A point in the coordinate of photogrammetric point cloud computed from Equation (2) can be then transformed into the
camera coordinate of the image mi by

x3

y3
z3
1

 = Mi
out


x2

y2
z2
1

 =


ri11 ri12 ri13 ti1
ri21 ri22 ri23 ti2
ri31 ri32 ri33 ti3
0 0 0 1



x2

y2
z2
1

 (3)

and subsequently projected to the image plane by applyingx4

y4
z4

 = Mi
int =

fx s cx
0 fy cy
0 0 1

x3

y3
z3

 , (4)

where x3, y3, z3 are coordinates in camera coordinates, and x4, y4, z4 are the perspective projected coordinates on
the image coordinates. The image coordinates of the projected point in the image plane is calculated by homogeneous


coordinate normalisation, [
u, v, 1

]T
=

1

z4

[
x4, y4, z4

]T
, (5)

where u and v are the pixel coordinates in the horizontal and vertical direction in the image plane.
After applying these equations, a point in the laser scanned point cloud (x0, y0, z0) is transformed to the pixel coordinate
of image plane (u, v). Now we need to check whether this point is in the field of view of the camera by checking conditions

0 ď u ď W X 0 ď v ď H, (6)

where W denotes the width and H denotes the height of the image. If a point satisfies this condition, this point is visible
in the corresponding image.
If useful semantic information (like a detected bounding box) is extracted from an image, we need to check further which
points in 3D space are projected to this area. The pixel coordinates in an image are checked by

(u, v) Ď Si, (7)

where (u, v) are the pixel coordinates in image plane, Si denotes the ith detected bounding box in this image. Then we can
attach the recognised texts inside the corresponding bounding box to those points that are projected to this bounding box.

2D information enrichment
In this step, standardized labels on pipes are recognised and the corresponding information is extracted. An example is
shown in Figure 3. Standardized labels on pipes represent information on the contained fluid (like liquid type and direction
of flow) which is useful information for obtaining a rich model of the facility as required by facility managers maintaining
the piping systems. In our approach, we use the open-source tool MMOCR (Kuang et al. 2021) to achieve text detection
in images. In order to improve the performance of text recognition, the detected bounding boxes are first rotated to an
angle where their longer sides are parallel to the horizontal axis. Then text recognition is applied to the rotated bounding
box and we select the highest prediction score as the recognised text. The recognition result before and after rotating is
compared in Figure 3. The recognition scores improves a lot with proper rotation, from 79.3% to 99.8%. The label text is
recognised as "Vorlauf Heizung" (flow heating), "Rucklauf Heizung" (return heating), and "Vorlauf Heizung" respectively,
which is consistent to the true texts on the labels with the exception of German umlauts ‘ä’and ‘ü’as the model used is
pretrained in English.

(a) (b)
Figure 3: Detected bounding box before and after rotating (Text score before and after rotation: 0.793 and 0.998). (a) Original image

and detected text boxes (b) Rotated image and detected text boxes

With regard to detecting the arrow direction shown on the label, our approach starts with enlarging the detected text
bounding box first. We then apply Canny edge detector (Rong et al. 2014) and Hough transform (Mukhopadhyay &
Chaudhuri 2015) to detect lines and compute their intersections. Considering the fact that the head point of the arrow is
close to the center line of detected label and arrow body points lie on the detected lines, we can identify the label arrow
direction unambiguously. This shape for labels describing of pipe content and direction of flow is valid for all piping
systems marked according to German code DIN 2403:2018-10.
By following the computation in the previous section, the information contained in these labels can be mapped to the 3D
space of the laser scanned point cloud. Thus we are able to map all detected information to the 3D reconstruction of the
pipe system, including both the text information as the recognised arrow indicating flow direction.

Result and discussion
Dataset
The dataset we used was captured in the basement of a building on the campus of the Technical University of Munich
using a Leica RTC360 laser scanner and Canon EOS 600d camera.


(a) (b) (c)
Figure 4: Steps for the recognition of label arrow direction

(a) Detected and (b) Enlarged text bounding box (only region of interest is masked) (c) Line detection and intersection, green: detected
lines, orange, dashed: center line, red: arrow head, blue: arrow body points

Results
In Figure 5, we show the qualitative intermediate result of our toolchain step by step. Figure 5(a) and Figure 5(b) show
the input point cloud and predicted pipe points in the system, respectively. It is obvious to see that most true pipe points
are recognised as such. Different pipe instances can be segmented from all pipe points by region growing, as shown in
in Figure 5(c), encoded with varying color. The centre lines of cylinders that are extracted by RANSAC are illustrated in
Figure 5(d) and the corresponding reconstructed pipes are shown in Figure 5(e). In Figure 5(f), the information recognised
from labels on the pipe are added to the reconstructed model, including fluid property and direction in our case. In
conclusion, all pipes in our test set could be reconstructed automatically and corresponding information could be added to
the model properly.
Regarding the quantitative evaluation, we list the diameter of our reconstructed pipes in Table 2. In this, the ground truth
model is not the diameter of pipes that are measured in the real world, but rather our manual measurement in the point
cloud. Comparison is conducted between the automatically created model and those values. As we can see, the diameter
deviation is small, the largest absolute and relative deviation being 0.01m, respectively 6.3%.

Table 2: Quantitative precision evaluation of pipe reconstruction against ground truth measured in the point cloud

Segment No. Ground truth
(m)

our model (m) Deviation (abs.)
(m)

Deviation (rel.)
(%)

1 0.158 0.150 0.008 5.1

2 0.163 0.157 0.006 3.7

3 0.159 0.169 0.010 6.3

4 0.158 0.165 0.007 4.4

5 0.198 0.195 0.003 1.5

6 0.215 0.216 0.001 0.5

Contribution and limitations
We describe the contributions of our work as follows:

• We propose a method that can be used to co-register photos taken by camera and point clouds taken by modern laser
scanner equipment automatically. In addition, we show the co-registration method provides convincing results in an
automatic end-to-end process to create and enrich 3D models for pipe systems.

• Our method creates a comprehensive model which contains geometric information of pipes as well as semantic
information such as content type and flow direction from standardized pipe labels by extracting information from
two different data sources, point clouds and photos.

However, there are still following limitations:

• Images taken by the laser scanner are used as a ‘bridge’ to connect the laser-scanned point cloud and camera images.
For our method to work, laser scanners with RGB sensors are required to enable the fully automatic process.


• The direction recognition step is applicable as presented for piping systems that are labeled in compliance to German
code. For application in other countries, assumptions need to be adapted2.

Conclusions
In this paper, we propose an automatic method to co-register photos taken by camera and point clouds generated by laser
scanner. In addition, we show the co-registration method works well as part of the presented end-to-end approach to create
and enrich 3D models of pipes. As this method is fully automated, and human intervention is limited to data capture, it
provides the possibility to generate and update the model frequently at a low cost. The method introduced in this paper to
register 2D images by a camera to the laser-scanned point cloud also allows to register images taken by other sensors. In
our future work, we aim to integrate thermal information in the process.

2e.g. for the United States according to ASME A13.1 - 2020


Acknowledgments
The work presented in this paper is funded by the Institute for Advanced Study (IAS) at the Technical University of
Munich. It is conducted within the scope of a project funded by Audi AG, Ingolstadt. We thank the NVIDIA Applied
Research Accelerator Program for their support in providing high-performance hardware for computation. Our thanks go
to the TUM chair for Engineering Geodesy for the laser scanner equipment and support.

References

Adán, A., Quintana, B., Prieto, S. A. & Bosché, F. (2018), ‘Scan-to-BIM for secondary’ building components’, Advanced
Engineering Informatics 37(November 2017), 119–138.

Agapaki, E. & Brilakis, I. (2020), ‘CLOI-NET: Class segmentation of industrial facilities’ point cloud datasets’, Advanced
Engineering Informatics 45.

Andriasyan, M., Moyano, J., Nieto-Julián, J. E. & Antón, D. (2020), ‘From point cloud data to Building Information
Modelling: An automatic parametric workflow for heritage’, Remote Sensing 12(7).

Armeni, I., Sener, O., Zamir, A. R., Jiang, H., Brilakis, I., Fischer, M. & Savarese, S. (2016), 3d semantic parsing
of large-scale indoor spaces, in ‘Proceedings of the IEEE International Conference on Computer Vision and Pattern
Recognition’.

Borrmann, A., König, M., Koch, C. & Beetz, J. (2018), Building Information Modeling : Why ? What ? How ?, in
‘Borrmann A., König M., Koch C., Beetz J. (eds) Building Information Modeling. Springer, Cham’, Springer.

Brilakis, I., Pan, Y., Borrmann, A., Mayer, H.-G., Rhein, F., Vos, C., Pettinato, E. & Wagner, S. (2019), Built environ-
ment digital twining, International Workshop on Built Environment Digital Twinning presented by TUM Institute for
Advanced Study and Siemens AG.

Croce, V., Caroti, G., Luca, L. D., Jacquot, K., Piemonte, A. & Véron, P. (2021), ‘From the semantic point cloud to
heritage-building information modeling: A semiautomatic approach exploiting machine learning’, Remote Sensing
13(3), 1–34.

Czerniawski, T., Nahangi, M., Haas, C. & Walbridge, S. (2016), ‘Pipe spool recognition in cluttered point clouds using a
curvature-based shape descriptor’, Automation in Construction 71(Part 2), 346–358.

Esfahani, M. E., Eray, E., Chuo, S., Sharif, M. M. & Haas, C. (2019), ‘Using scan-to-BIM techniques to find optimal
modeling effort; a methodology for adaptive reuse projects’, Proceedings of the 36th International Symposium on
Automation and Robotics in Construction, ISARC 2019 (Isarc), 772–779.

Fischler, M. A. & Bolles, R. C. (1981), ‘Random sample consensus: A paradigm for model fitting with applications to
image analysis and automated cartography’, Commun. ACM 24(6), 381395.

Fumarola, M. & Poelman, R. (2011), Generating virtual environments of real world facilities: Discussing four different
approaches, in ‘Automation in Construction’, Vol. 20, pp. 263–269.

Huang, Q., Wang, W. & Neumann, U. (2018), Recurrent slice networks for 3d segmentation of point clouds, in ‘Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition’, pp. 2626–2635.

Hullo, J.-F., Thibault, G., Boucheny, C., Dory, F. & Mas, A. (2015), ‘Multi-Sensor As-Built Models of Complex Industrial
Architectures’, Remote Sensing 7(12), 16339–16362.

Kritzinger, W., Karner, M., Traar, G., Henjes, J. & Sihn, W. (2018), ‘Digital Twin in manufacturing: A categorical
literature review and classification’, IFAC-PapersOnLine 51(11), 1016–1022.

Kuang, Z., Sun, H., Li, Z., Yue, X., Lin, T. H., Chen, J., Wei, H., Zhu, Y., Gao, T., Zhang, W., Chen, K., Zhang, W. &
Lin, D. (2021), ‘Mmocr: A comprehensive toolbox for text detection, recognition and understanding’, arXiv preprint
arXiv:2108.06543 .


Landrieu, L. & Simonovsky, M. (2018), Large-scale point cloud semantic segmentation with superpoint graphs, in
‘Proceedings of the IEEE conference on computer vision and pattern recognition’, pp. 4558–4567.

Li, Y., Bu, R., Sun, M., Wu, W., Di, X. & Chen, B. (2018), ‘Pointcnn: Convolution on x-transformed points’, Advances in
neural information processing systems 31, 820–830.

Lu, R. & Brilakis, I. (2019), ‘Digital twinning of existing reinforced concrete bridges from labelled point clusters’,
Automation in Construction 105.

Mukhopadhyay, P. & Chaudhuri, B. B. (2015), ‘A survey of hough transform’, Pattern Recognition 48(3), 993–1010.

Noichl, F., Braun, A. & Borrmann, A. (2021), ‘"BIM-to-Scan" for Scan-to-BIM: Generating Realistic Synthetic Ground
Truth Point Clouds based on Industrial 3D Models’, Proceedings of the 2021 European Conference on Computing in
Construction 2, 164–172.

Parisher, R. A. & Rhea, R. A. (2011), Pipe fittings, in ‘Pipe Drafting and Design’, 3 edn, Oxford, chapter 3, pp. 13–55.

Pärn, E. A., Edwards, D. J. & Sing, M. C. (2017), ‘The building information modelling trajectory in facilities management:
A review’.

Perez-perez, Y., Golparvar-fard, M. & El-rayes, K. (2021), ‘Scan2BIM-NET : Deep Learning Method for Segmentation
of Point Clouds for Scan-to-BIM’, Journal of Construction Engineering and Management 147(9), 1–14.

Qi, C. R., Su, H., Mo, K. & Guibas, L. J. (2016), ‘Pointnet: Deep learning on point sets for 3d classification and
segmentation’, arXiv preprint arXiv:1612.00593 .

Qi, C. R., Su, H., Mo, K. & Guibas, L. J. (2017), Pointnet: Deep learning on point sets for 3d classification and
segmentation, in ‘Proceedings of the IEEE conference on computer vision and pattern recognition’, pp. 652–660.

Rabbani, T., Van Den Heuvel, F. & Vosselmann, G. (2006), ‘Segmentation of point clouds using smoothness constraint’,
International archives of photogrammetry, remote sensing and spatial information sciences 36(5), 248–253.

Rong, W., Li, Z., Zhang, W. & Sun, L. (2014), An improved canny edge detection algorithm, in ‘2014 IEEE international
conference on mechatronics and automation’, IEEE, pp. 577–582.

Roynard, X., Deschaud, J. E. & Goulette, F. (2018), ‘Paris-Lille-3D: A large and high-quality ground-truth urban point
cloud dataset for automatic segmentation and classification’, International Journal of Robotics Research 37(6), 545–557.

Rusu, R. B. & Cousins, S. (2011), 3D is here: Point Cloud Library (PCL), in ‘IEEE International Conference on Robotics
and Automation (ICRA)’, IEEE, Shanghai, China.

Schönberger, J. L. & Frahm, J.-M. (2016), Structure-from-motion revisited, in ‘Conference on Computer Vision and
Pattern Recognition (CVPR)’.

Schönberger, J. L., Zheng, E., Pollefeys, M. & Frahm, J.-M. (2016), Pixelwise view selection for unstructured multi-view
stereo, in ‘European Conference on Computer Vision (ECCV)’.

Soilán, M., Nóvoa, A., Sánchez-Rodríguez, A., Riveiro, B. & Arias, P. (2020), Semantic Segmentation of Point Clouds
with Pointnet and Kpconv Architectures Applied to Railway Tunnels, in ‘ISPRS Annals of the Photogrammetry, Remote
Sensing and Spatial Information Sciences’, Vol. 5, Copernicus GmbH, pp. 281–288.

Son, H., Kim, C. & Turkan, Y. (2015), ‘Scan-to-BIM-an overview of the current state of the art and a look ahead’,
32nd International Symposium on Automation and Robotics in Construction and Mining: Connected to the Future,
Proceedings .

Talebi, S. (2014), ‘Exploring advantages and challenges of adaptation and implementation of BIM in project life cycle -
University of Salford Institutional Repository’, 2nd BIM International Conference on Challenges to Overcome .

Thomas, H., Qi, C. R., Deschaud, J.-E., Marcotegui, B., Goulette, F. & Guibas, L. J. (2019), Kpconv: Flexible and
deformable convolution for point clouds, in ‘Proceedings of the IEEE/CVF International Conference on Computer
Vision’, pp. 6411–6420.

Torr, P. H. & Zisserman, A. (2000), ‘Mlesac: A new robust estimator with application to estimating image geometry’,
Computer vision and image understanding 78(1), 138–156.


Volk, R., Stengel, J. & Schultmann, F. (2014), ‘Building Information Modeling (BIM) for existing buildings - Literature
review and future needs’, Automation in Construction 38(October 2017), 109–127.

Wang, B., Wang, Q., Cheng, J. C., Song, C. & Yin, C. (2022), ‘Vision-assisted BIM reconstruction from 3D LiDAR point
clouds for MEP scenes’, Automation in Construction 133(August 2021), 103997.

Wang, B., Yin, C., Luo, H., Cheng, J. C. & Wang, Q. (2021), ‘Fully automated generation of parametric BIM for MEP
scenes based on terrestrial laser scanning data’, Automation in Construction 125, 103615.

Wetzel, E. M. & Thabet, W. Y. (2015), ‘The use of a BIM-based framework to support safe facility management processes’,
Automation in Construction 60, 12–24.

Yokoyama, H., Date, H., Kanai, S. & Takeda, H. (2013), ‘Detection and classification of pole-like objects from mobile
laser scanning data of urban environments’, Int. J. CAD/CAM 13, 31–40.

Zhao, H., Jiang, L., Jia, J., Torr, P. H. & Koltun, V. (2021), Point transformer, in ‘Proceedings of the IEEE/CVF International
Conference on Computer Vision’, pp. 16259–16268.


(a) (b)

(c) (d)

(e) (f)
Figure 5: Overview of the process: (a) Laser scanning point cloud (b) points with predicted ‘pipe’class (c) pipe points clustered to

separate instances (d) RANSAC- and projection results for the pipe axes (e) 3D reconstruction with elbows using a sweep (f) 3D model
enriched with label information