iOS 11.3 is here – new opportunties

by Philipp Kast Master thesis

A bunch of interesting new capabilities have been introduced with iOS 11.3.

Recognizing Images in an AR Experience

Detect known 2D images in the user’s environment, and use their positions to place AR content.

https://developer.apple.com/documentation/arkit/recognizing_images_in_an_ar_experience

Handling 3D Interaction and UI Controls in Augmented Reality

Follow best practices for visual feedback, gesture interactions, and realistic rendering in AR experiences.

https://developer.apple.com/documentation/arkit/handling_3d_interaction_and_ui_controls_in_augmented_reality

Rough planning

by Philipp Kast Master thesis

Februar – March

Discovery phase: AR state of the art research.

April – May

Implement a manual re-positioning for ARchi VR.

Manual rotation and translation
Persist and select model
Select room in proximity
Select by floor
Investigate into corner matching, e. g. specify which corner belongs to where

Juni – Juli

Investigate and research to find a way for a better automatic registration of an existing scene in AR, something that could compete with available solutions (e. g. placenote).

AR code
Text recognition
Object recognition
Registration methods over point clouds

After the investigation an implementation of Automatic Registration in ARchi VR.

August – September

Implementation of a use case using automatic registration.

AR cloud
Maintenance

October – November

Documentation of the Master Thesis

December – Januar

Finalize Master Thesis
BFH requirement: Zu einer vollständigen Master-Thesis gehört die Erstellung des Book-Abstracts, das in der Publikation «Book» gedruckt und unter book.bfh.ch veröffentlicht wird. Das Abstract dient auch Ihrem persönlichen Renommee – Sie können Ihr professionell gestaltetes Abstract für Ihre (Bewerbungs-)Unterlagen verwenden. Das «Book» als Print- oder Onlineversion ist bei den Unternehmen eine beliebte Publikation. Viele Firmen rekrutieren über die Themen der Abschlussarbeiten bzw. über das Know-How der Studierenden ihre geeigneten Fachkräfte.

Registration of 3D point clouds

by Philipp Kast Master thesis

A interesting document about 3D point cloud registration.

https://www.researchgate.net/publication/316455393_3D_Point_Cloud_Registration_for_Localization_Using_a_Deep_Neural_Network_Auto-Encoder

It is even explained from one of the author in a video.

For our research only the first part is of interest, where it circles around how to find a affine transformation between two point clouds. The first method is key point detection and the second one plane based registration.

Key point based method

Numerous point cloud coarse registration methods have been developed [8], yet coarse registration remains an open challenge with much room for improvement. In the Fast Point Feature Histogram [9, 10] (FPFH) algorithm, a histogram based descriptor is calculated for each point within the point cloud, over multiple scales. Salient persistent histograms over multiple scale calculations are labeled as keypoints, which are then matched to find the registration between the point clouds.

Plane based registration

A different approach based on linear plane matching was developed for the coarse registration of airborne LIDAR point clouds [17]. By relying on the presence of linear struc- tures, this approach is limited to specific dataset classes.

The problem of fine registration between point clouds has been intensively studied, and high quality solutions now exist for online applications such as SLAM [3, 4]. The solutions revolve around the Iterative Closest Point (ICP) [18] algorithm and its improvements [19]. A noteworthy fine registration method that is based on the correlation of Extended Gaussian Images in the Fourier domain [20] was proposed as an alternative to ICP, although its final stage again relied on iterations of ICP for fine-tuning. Fineregistration is not the focus of this research, although to achieve end-to-end registration the standard ICP algorithm is utilized in its final stages.

An algorithm like ICP could be interesting to investigate, given that we could detect a couple of planes and then try to match them in two sessions, e. g. more precisely, given two point clouds, R (the reference) and S (the source), ICP tries to find the best rigid (or similarity) transform T so that T * S = R.

The following library could be interesting:

https://github.com/ethz-asl/libpointmatcher

Or maybe just something like that:

http://ieeexplore.ieee.org/document/4767965/

Or even already a c++ implementation:

https://github.com/oleg-alexandrov/projects/blob/master/eigen/Kabsch.cpp

Positioning strategies – first steps

by Philipp Kast Master thesis

Relocalization that ensures that AR content stays in place between sessions, and most importantly is easy to find again when you start a new session nearby is a problem under active research.

Some interesting statements I found around this area below.

ARKit doesn’t have any features for tracking device position or placing content in “absolute” geospatial coordinates. Actually doing that (and doing it well) is sort of a hard problem… but there are a few things to help you on your way.

First, check out the ARSessionConfiguration worldAlignment setting. With the default gravityoption, x and z directions are relative to the device’s original heading, as of when you started the session. Getting from there to geospatial coordinates is next to impossible.

But with the gravityAndHeading option, the axes are fixed to north/south and east/west — the position of the ARKit coordinate system’s origin is still relative to where the device is at the beginning of the session, but the directions are absolute. That gives you a basis for converting to/from geospatial coordinates.

But there’s still a question of precision. ARKit tracks features up to a few meters away, down to a precision of a couple millimeters. Core Location tracks the device to a precision of several meters. So, if you have a real-world feature and you want to put virtual content on top of it… you could convert a lat/long to a position in ARKit space, but then you’re likely to find that your content doesn’t really line up close enough.

It’s not an unsolvable problem, but not an easy one either. Good luck!

https://stackoverflow.com/questions/44876262/getting-the-absolute-position-of-a-point-in-arkit

This seems to be an area of active research in the iOS developer community — I met several teams trying to figure it out at WWDC last week, and nobody had even begun to crack it yet. So I’m not sure there’s a “best way” yet, if even a feasible way at all.

Feature points are positioned relative to the session, and aren’t individually identified, so I’d imagine correlating them between multiple users would be tricky.

The session alignment mode gravityAndHeading might prove helpful: that fixes all the directions to a (presumed/estimated to be) absolute reference frame, but positions are still relative to where the device was when the session started. If you could find a way to relate that position to something absolute — a lat/long, or an iBeacon maybe — and do so reliably, with enough precision… Well, then you’d not only have a reference frame that could be shared by multiple users, you’d also have the main ingredients for location based AR. (You know, like a floating virtual arrow that says turn right there to get to Gate A113 at the airport, or whatever.)

Another avenue I’ve heard discussed is image analysis. If you could place some real markers — easily machine recognizable things like QR codes — in view of multiple users, you could maybe use some form of object recognition or tracking (a ML model, perhaps?) to precisely identify the markers’ positions and orientations relative to each user, and work back from there to calculate a shared frame of reference. Dunno how feasible that might be. (But if you go that route, or similar, note that ARKit exposes a pixel buffer for each captured camera frame.)

Good luck!

https://stackoverflow.com/questions/44529350/arkit-with-multiple-users?rq=1

Obviously you want to build AR apps that permanently place augmented reality content in precise locations, indoors and outdoors. You can use GPS, markers or beacons for geolocation. Or you can try to use computer vision systems to provide an additional layer of mapping and persistent visual positioning to mobile AR apps.

Plane detection is the basic feature that ARKit provides. Below is a good introduction into this topic.

https://blog.markdaws.net/arkit-by-example-part-2-plane-detection-visualization-10f05876d53

Object detection in general could be an approach for positioning. How good object detection works can be seen by using this project (and actually immediately see the limitations as well):

https://github.com/hanleyweng/CoreML-in-ARKit

As a side note, another awesome page, this time just for ARKit :

https://github.com/olucurious/Awesome-ARKit

There is an article again about how to integrate placenote SDK (which has already been covered in a previous post).

https://virtualrealitypop.com/building-an-ar-house-manual-for-your-airbnb-with-arkit-placenote-sdk-99422fa6029f

This also discloses how that works.

Placenote SDK uses a custom feature detection algorithm that looks for “smart” features in the environment that it thinks can be found persistently across AR sessions. This feature detection is independent of ARKit feature tracking and runs in a parallel thread as you move your phone around.

The collection of smart features forms a 3D feature map of the space that can be saved and uploaded to the Placenote Cloud.

Looks like the current approaches are circling around object detection.

What can be seen by playing around with both object detection and placenote: They have currently some limitations. One thing is for sure, you can have a pretty amusing time, especially with the object detection example program! It detects stethoscopes all around!

Placenote experiment

by Philipp Kast Master thesis

https://placenote.com/ provides a software and service for registration of 3D scenes in AR: https://vertical.ai/.

To put it simply, Placenote lets you build AR apps that permanently place augmented reality content in precise locations, indoors and outdoors. Placenote does not need GPS, markers or beacons for geolocation. Instead, it uses an advanced computer vision system to provide an additional layer of mapping and persistent visual positioning to mobile AR apps built in Unity or Native Scenekit.

I have registered and built the example by using their kit.

https://placenote.com/documentation/

The sample allows you to position objects in a AR session and save the map to their service. Then in a consecutive AR session you can load the saved map again. The sample seems to work pretty well and successfully re-positions the objects. The accuracy is not very high, objects can be a bit off the intended position. If the starting position is very different (other side of a room) it doesn’t place the objects.

The source code of the sample is also available on github https://github.com/Placenote.

Official Kick Off

by Philipp Kast Master thesis

Kick off

We met in Zürich yesterday for the official kick off for the master thesis work.

We came up with the following goals for the next week:

Work out a coarse plan for the master thesis work (until Februar 2019)
In a first step implement a manual re-registration in the ARchi VR app: https://itunes.apple.com/il/app/archi-vr/id1317896781?mt=8
In a second step, investigate what can be automated in this process. E. g. edge detection, AR code, text recognition.

Discovery

Until now the time was spent with a discovery in the augmented reality (AR) environment. An investigation in the state of the art technologies have been conducted. From ARKit on iOS to ARCore on Android – a basic deep dive into the technology.

I have also got a access to the source code of ARchi VR which is located on a not public repository on the ZHAW github: https://github.engineering.zhaw.ch/acke/ARchiVR. I looked into the code and built and slightly adapted the sources to get a feeling for the ARKit technology in use and https://github.com/archilogic-com/3dio-js from https://3d.io/ by Archilogic, Zurich.

Also some basic business model considerations have been looked at, like usages in the maintenance environment (VR and AR, e. g. remote maintenance) or the area of the AR cloud, e. g. https://medium.com/super-ventures-blog/arkit-and-arcore-will-not-usher-massive-adoption-of-mobile-ar-da3d87f7e5ad.

Category Master thesis

Recognizing Images in an AR Experience

Handling 3D Interaction and UI Controls in Augmented Reality

Februar – March

April – May

Juni – Juli

August – September

October – November

December – Januar

Key point based method

Plane based registration

Kick off

Discovery