All posts by Philipp Kast

Registration of 3D point clouds

A interesting document about 3D point cloud registration.

It is even explained from one of the author in a video.

For our research only the first part is of interest, where it circles around how to find a affine transformation between two point clouds. The first method is key point detection and the second one plane based registration.

Key point based method

Numerous point cloud coarse registration methods have been developed [8], yet coarse registration remains an open challenge with much room for improvement. In the Fast Point Feature Histogram [9, 10] (FPFH) algorithm, a histogram based descriptor is calculated for each point within the point cloud, over multiple scales. Salient persistent histograms over multiple scale calculations are labeled as keypoints, which are then matched to find the registration between the point clouds.

Plane based registration

A different approach based on linear plane matching was developed for the coarse registration of airborne LIDAR point clouds [17]. By relying on the presence of linear struc- tures, this approach is limited to specific dataset classes.

The problem of fine registration between point clouds has been intensively studied, and high quality solutions now exist for online applications such as SLAM [3, 4]. The solutions revolve around the Iterative Closest Point (ICP) [18] algorithm and its improvements [19]. A noteworthy fine registration method that is based on the correlation of Extended Gaussian Images in the Fourier domain [20] was proposed as an alternative to ICP, although its final stage again relied on iterations of ICP for fine-tuning. Fineregistration is not the focus of this research, although to achieve end-to-end registration the standard ICP algorithm is utilized in its final stages.

An algorithm like ICP could be interesting to investigate, given that we could detect a couple of planes and then try to match them in two sessions, e. g. more precisely, given two point clouds, R (the reference) and S (the source), ICP tries to find the best rigid (or similarity) transform T so that T * S = R.

The following library could be interesting:

Or maybe just something like that:

Or even already a c++ implementation:


Positioning strategies – first steps

Relocalization that ensures that AR content stays in place between sessions, and most importantly is easy to find again when you start a new session nearby is a problem under active research.

Some interesting statements I found around this area below.

ARKit doesn’t have any features for tracking device position or placing content in “absolute” geospatial coordinates. Actually doing that (and doing it well) is sort of a hard problem… but there are a few things to help you on your way.

First, check out the ARSessionConfiguration worldAlignment setting. With the default gravityoption, x and z directions are relative to the device’s original heading, as of when you started the session. Getting from there to geospatial coordinates is next to impossible.

But with the gravityAndHeading option, the axes are fixed to north/south and east/west — the position of the ARKit coordinate system’s origin is still relative to where the device is at the beginning of the session, but the directions are absolute. That gives you a basis for converting to/from geospatial coordinates.

But there’s still a question of precision. ARKit tracks features up to a few meters away, down to a precision of a couple millimeters. Core Location tracks the device to a precision of several meters. So, if you have a real-world feature and you want to put virtual content on top of it… you could convert a lat/long to a position in ARKit space, but then you’re likely to find that your content doesn’t really line up close enough.

It’s not an unsolvable problem, but not an easy one either. Good luck!

This seems to be an area of active research in the iOS developer community — I met several teams trying to figure it out at WWDC last week, and nobody had even begun to crack it yet. So I’m not sure there’s a “best way” yet, if even a feasible way at all.

Feature points are positioned relative to the session, and aren’t individually identified, so I’d imagine correlating them between multiple users would be tricky.

The session alignment mode gravityAndHeading might prove helpful: that fixes all the directions to a (presumed/estimated to be) absolute reference frame, but positions are still relative to where the device was when the session started. If you could find a way to relate that position to something absolute — a lat/long, or an iBeacon maybe — and do so reliably, with enough precision… Well, then you’d not only have a reference frame that could be shared by multiple users, you’d also have the main ingredients for location based AR. (You know, like a floating virtual arrow that says turn right there to get to Gate A113 at the airport, or whatever.)

Another avenue I’ve heard discussed is image analysis. If you could place some real markers — easily machine recognizable things like QR codes — in view of multiple users, you could maybe use some form of object recognition or tracking (a ML model, perhaps?) to precisely identify the markers’ positions and orientations relative to each user, and work back from there to calculate a shared frame of reference. Dunno how feasible that might be. (But if you go that route, or similar, note that ARKit exposes a pixel buffer for each captured camera frame.)

Good luck!

Obviously you want to build AR apps that permanently place augmented reality content in precise locations, indoors and outdoors. You can use GPS, markers or beacons for geolocation. Or you can try to use computer vision systems to provide an additional layer of mapping and persistent visual positioning to mobile AR apps.

Plane detection is the basic feature that ARKit provides. Below is a good introduction into this topic.

Object detection in general could be an approach for positioning. How good object detection works can be seen by using this project (and actually immediately see the limitations as well):

As a side note, another awesome page, this time just for ARKit :

There is an article again about how to integrate placenote SDK (which has already been covered in a previous post).

This also discloses how that works.

Placenote SDK uses a custom feature detection algorithm that looks for “smart” features in the environment that it thinks can be found persistently across AR sessions. This feature detection is independent of ARKit feature tracking and runs in a parallel thread as you move your phone around.

The collection of smart features forms a 3D feature map of the space that can be saved and uploaded to the Placenote Cloud

Looks like the current approaches are circling around object detection.

What can be seen by playing around with both object detection and placenote: They have currently some limitations. One thing is for sure, you can have a pretty amusing time, especially with the object detection example program! It detects stethoscopes all around!

Placenote experiment provides a software and service for registration of 3D scenes in AR:

To put it simply, Placenote lets you build AR apps that permanently place augmented reality content in precise locations, indoors and outdoors. Placenote does not need GPS, markers or beacons for geolocation. Instead, it uses an advanced computer vision system to provide an additional layer of mapping and persistent visual positioning to mobile AR apps built in Unity or Native Scenekit.

I have registered and built the example by using their kit.

The sample allows you to position objects in a AR session and save the map to their service. Then in a consecutive AR session you can load the saved map again. The sample seems to work pretty well and successfully re-positions the objects. The accuracy is not very high, objects can be a bit off the intended position. If the starting position is very different (other side of a room) it doesn’t place the objects.

The source code of the sample is also available on github

Official Kick Off

Kick off

We met in Zürich yesterday for the official kick off for the master thesis work.

We came up with the following goals for the next week:

  • Work out a coarse plan for the master thesis work (until Februar 2019)
  • In a first step implement a manual re-registration in the ARchi VR app:
  • In a second step, investigate what can be automated in this process. E. g. edge detection, AR code, text recognition.


Until now the time was spent with a discovery in the augmented reality (AR) environment. An investigation in the state of the art technologies have been conducted. From ARKit on iOS to ARCore on Android – a basic deep dive into the technology.

I have also got a access to the source code of ARchi VR which is located on a not public repository on the ZHAW github: I looked into the code and built and slightly adapted the sources to get a feeling for the ARKit technology in use and from by Archilogic, Zurich.

Also some basic business model considerations have been looked at, like usages in the maintenance environment (VR and AR, e. g. remote maintenance) or the area of the AR cloud, e. g.


Dies ist der Blog zu meiner Master Arbeit. Ich werde hier das Projekt dokumentieren und den Fortschritt und die gewonnenen Erkenntnisse festhalten.