All posts by Philipp Kast

iPhone X’s TrueDepth front-facing camera

iPhone X’s TrueDepth front-facing camera might help to get better results in tracking.

Creating Face-Based AR Experiences

Use the information provided by a face tracking AR session to place and animate 3D content.

And some random notes on floor plans. Interesting post about exactly that:

There are products in this area as well:

General IoT cloud providers explained.

On a side note: Rolex:

iOS 11.3 is here – new opportunties

A bunch of interesting new capabilities have been introduced with iOS 11.3.

Recognizing Images in an AR Experience

Detect known 2D images in the user’s environment, and use their positions to place AR content.

Handling 3D Interaction and UI Controls in Augmented Reality

Follow best practices for visual feedback, gesture interactions, and realistic rendering in AR experiences.

Rough planning

Februar – March

Discovery phase: AR state of the art research.

April – May

Implement a manual re-positioning for ARchi VR.

  • Manual rotation and translation
  • Persist and select model
  • Select room in proximity
  • Select by floor
  • Investigate into corner matching, e. g. specify which corner belongs to where

Juni – Juli

Investigate and research to find a way for a better automatic registration of an existing scene in AR, something that could compete with available solutions (e. g. placenote).

  • AR code
  • Text recognition
  • Object recognition
  • Registration methods over point clouds

After the investigation an implementation of Automatic Registration in ARchi VR.

August – September

Implementation of a use case using automatic registration.

  • AR cloud
  • Maintenance

October – November

  • Documentation of the Master Thesis

December – Januar

  • Finalize Master Thesis
  • BFH requirement: Zu einer vollständigen Master-Thesis gehört die Erstellung des Book-Abstracts, das in der Publikation «Book» gedruckt und unter veröffentlicht wird. Das Abstract dient auch Ihrem persönlichen Renommee – Sie können Ihr professionell gestaltetes Abstract für Ihre (Bewerbungs-)Unterlagen verwenden. Das «Book» als Print- oder Onlineversion ist bei den Unternehmen eine beliebte Publikation. Viele Firmen rekrutieren über die Themen der Abschlussarbeiten bzw. über das Know-How der Studierenden ihre geeigneten Fachkräfte.

Registration of 3D point clouds

A interesting document about 3D point cloud registration.

It is even explained from one of the author in a video.

For our research only the first part is of interest, where it circles around how to find a affine transformation between two point clouds. The first method is key point detection and the second one plane based registration.

Key point based method

Numerous point cloud coarse registration methods have been developed [8], yet coarse registration remains an open challenge with much room for improvement. In the Fast Point Feature Histogram [9, 10] (FPFH) algorithm, a histogram based descriptor is calculated for each point within the point cloud, over multiple scales. Salient persistent histograms over multiple scale calculations are labeled as keypoints, which are then matched to find the registration between the point clouds.

Plane based registration

A different approach based on linear plane matching was developed for the coarse registration of airborne LIDAR point clouds [17]. By relying on the presence of linear struc- tures, this approach is limited to specific dataset classes.

The problem of fine registration between point clouds has been intensively studied, and high quality solutions now exist for online applications such as SLAM [3, 4]. The solutions revolve around the Iterative Closest Point (ICP) [18] algorithm and its improvements [19]. A noteworthy fine registration method that is based on the correlation of Extended Gaussian Images in the Fourier domain [20] was proposed as an alternative to ICP, although its final stage again relied on iterations of ICP for fine-tuning. Fineregistration is not the focus of this research, although to achieve end-to-end registration the standard ICP algorithm is utilized in its final stages.

An algorithm like ICP could be interesting to investigate, given that we could detect a couple of planes and then try to match them in two sessions, e. g. more precisely, given two point clouds, R (the reference) and S (the source), ICP tries to find the best rigid (or similarity) transform T so that T * S = R.

The following library could be interesting:

Or maybe just something like that:

Or even already a c++ implementation:


Positioning strategies – first steps

Relocalization that ensures that AR content stays in place between sessions, and most importantly is easy to find again when you start a new session nearby is a problem under active research.

Some interesting statements I found around this area below.

ARKit doesn’t have any features for tracking device position or placing content in “absolute” geospatial coordinates. Actually doing that (and doing it well) is sort of a hard problem… but there are a few things to help you on your way.

First, check out the ARSessionConfiguration worldAlignment setting. With the default gravityoption, x and z directions are relative to the device’s original heading, as of when you started the session. Getting from there to geospatial coordinates is next to impossible.

But with the gravityAndHeading option, the axes are fixed to north/south and east/west — the position of the ARKit coordinate system’s origin is still relative to where the device is at the beginning of the session, but the directions are absolute. That gives you a basis for converting to/from geospatial coordinates.

But there’s still a question of precision. ARKit tracks features up to a few meters away, down to a precision of a couple millimeters. Core Location tracks the device to a precision of several meters. So, if you have a real-world feature and you want to put virtual content on top of it… you could convert a lat/long to a position in ARKit space, but then you’re likely to find that your content doesn’t really line up close enough.

It’s not an unsolvable problem, but not an easy one either. Good luck!

This seems to be an area of active research in the iOS developer community — I met several teams trying to figure it out at WWDC last week, and nobody had even begun to crack it yet. So I’m not sure there’s a “best way” yet, if even a feasible way at all.

Feature points are positioned relative to the session, and aren’t individually identified, so I’d imagine correlating them between multiple users would be tricky.

The session alignment mode gravityAndHeading might prove helpful: that fixes all the directions to a (presumed/estimated to be) absolute reference frame, but positions are still relative to where the device was when the session started. If you could find a way to relate that position to something absolute — a lat/long, or an iBeacon maybe — and do so reliably, with enough precision… Well, then you’d not only have a reference frame that could be shared by multiple users, you’d also have the main ingredients for location based AR. (You know, like a floating virtual arrow that says turn right there to get to Gate A113 at the airport, or whatever.)

Another avenue I’ve heard discussed is image analysis. If you could place some real markers — easily machine recognizable things like QR codes — in view of multiple users, you could maybe use some form of object recognition or tracking (a ML model, perhaps?) to precisely identify the markers’ positions and orientations relative to each user, and work back from there to calculate a shared frame of reference. Dunno how feasible that might be. (But if you go that route, or similar, note that ARKit exposes a pixel buffer for each captured camera frame.)

Good luck!

Obviously you want to build AR apps that permanently place augmented reality content in precise locations, indoors and outdoors. You can use GPS, markers or beacons for geolocation. Or you can try to use computer vision systems to provide an additional layer of mapping and persistent visual positioning to mobile AR apps.

Plane detection is the basic feature that ARKit provides. Below is a good introduction into this topic.

Object detection in general could be an approach for positioning. How good object detection works can be seen by using this project (and actually immediately see the limitations as well):

As a side note, another awesome page, this time just for ARKit :

There is an article again about how to integrate placenote SDK (which has already been covered in a previous post).

This also discloses how that works.

Placenote SDK uses a custom feature detection algorithm that looks for “smart” features in the environment that it thinks can be found persistently across AR sessions. This feature detection is independent of ARKit feature tracking and runs in a parallel thread as you move your phone around.

The collection of smart features forms a 3D feature map of the space that can be saved and uploaded to the Placenote Cloud

Looks like the current approaches are circling around object detection.

What can be seen by playing around with both object detection and placenote: They have currently some limitations. One thing is for sure, you can have a pretty amusing time, especially with the object detection example program! It detects stethoscopes all around!