All posts by Philipp Kast

Planning, updated

Due to recent occupational burden, I have to adapt the rough planning for the work. Below I have updated the timing for the work packages (strike through the original).

Februar – March

Discovery phase: AR state of the art research.

April – May April – Juli

Implement a manual re-positioning for ARchi VR.

  • Manual rotation and translation
  • Persist and select model
  • Select room in proximity
  • Select by floor
  • Investigate into corner matching, e. g. specify which corner belongs to where

Juni – Juli August – September

Investigate and research to find a way for a better automatic registration of an existing scene in AR, something that could compete with available solutions (e. g. placenote).

  • AR code
  • Text recognition
  • Object recognition
  • Registration methods over point clouds

After the investigation an implementation of Automatic Registration in ARchi VR.

August – September October

Implementation of a use case using automatic registration.

  • AR cloud
  • Maintenance

October – November November

  • Documentation of the Master Thesis

December – Januar

  • Finalize Master Thesis
  • BFH requirement: Zu einer vollständigen Master-Thesis gehört die Erstellung des Book-Abstracts, das in der Publikation «Book» gedruckt und unter book.bfh.ch veröffentlicht wird. Das Abstract dient auch Ihrem persönlichen Renommee – Sie können Ihr professionell gestaltetes Abstract für Ihre (Bewerbungs-)Unterlagen verwenden. Das «Book» als Print- oder Onlineversion ist bei den Unternehmen eine beliebte Publikation. Viele Firmen rekrutieren über die Themen der Abschlussarbeiten bzw. über das Know-How der Studierenden ihre geeigneten Fachkräfte.

Cloud Storage for Firebase

Get Started on iOS  |  Firebase

Get Started on iOS  |  Firebase

Cloud Storage for Firebase lets you upload and share user generated content, such as images and video, which allows you to build rich media content into your apps. Your data is stored in a Google Cloud Storage bucket, an exabyte scale object storage solution with high availability and global redundancy. Cloud Storage lets you securely upload these files directly from mobile devices and web browsers, handling spotty networks with ease.

Source: firebase.google.com/docs/storage/ios/start

Strategy for “Resume Session”

Each time an AR app is opened in ARKit, it sets an arbitrary origin point (0, 0, 0) at the device’s current position and places all content in the scene relative to that origin point. This is because, although ARKit detects visual features in the environment to track the device’s motion, it has no ability to “remember” these features.

We do start the AR session with worldAlignment set to .gravityAndHeading, this option fixes the directions of the three coordinate axes to real-world directions, the location of the coordinate system’s origin is still relative to the device, matching the device’s position as of when the session configuration is first run.

At the time a room is scanned, measured and saved in the ARchi App, it happens relative to this arbitrary world origin in the AR session, which solely depends on the position of the device at the time the AR session has been started. Thus, the position of values measured in a previous session will never match a next session, since the next session will obviously always have another world origin, as the device position will never match exactly.

Apps like Pokemon GO did this by using GPS to geolocate Pokemon near specific real world landmarks like shops and parks. This kind of geolocation is sufficient for rough positioning but since GPS is so inaccurate, the positions of content can be between 5 to 20 meters off target every time you open the app. Further, GPS doesn’t work indoors which is where most “interior” design happens.

Other solutions on the market take a different approach to solve this problem. Rather than GPS, they uses a camera to visually scan a physical space and turn it into a map for future retrieval. It also gives a camera the ability to position itself in a map by comparing its current image with a pre-created map. This is simultaneous localisation and mapping (SLAM) technology, which can build a model simulating the real environment through the background process based on panoramic 3D reconstruction. Some refer this to provide a Visual Positioning System (VPS) to overcome the limitations of GPS for AR applications.

Back to ARKit: To create a correspondence between real and virtual spaces, ARKit uses a technique called visual-inertial odometry. This process combines information from the iOS device’s motion sensing hardware with computer vision analysis of the scene visible to the device’s camera. ARKit recognizes notable features in the scene image, tracks differences in the positions of those features across video frames, and compares that information with motion sensing data. The result is a high-precision model of the device’s position and motion.

World tracking is an inexact science. This process can often produce impressive accuracy, leading to realistic AR experiences. However, it relies on details of the device’s physical environment that are not always consistent or are difficult to measure in real time without some degree of error.

If we would like to resume a session, we have to move the world origin of the AR session to the location, where it has been during the scanning of the room, or we move the measured points.

We will allow manual movement relative to one of the corner points of the walls, in which case we will move the world origin by the vector of the movement of corner point. We also have to allow a rotation, even though we start the session with the gravityAndHeading, this is still not accurate enough, at least not accurate enough for our purposes, experiments show that it is  always necessary to rotate too. Given the fact that this has to happen every time you resume a session, it is questionable weather manual positioning is reasonable. Therefore, in order to make this at least somehow usable – this has to be fast and very easy from a UX perspective.

Bottom line is, we really need to thrive for an automatic or at least halve automatic process in order to make this  user friendly and user lovable.

ARKit is able – and its core competency – to detect horizontal and vertical planes. Each time a plane is detected, we should be able to utilize this information for automatic positioning of walls. If ARKit detect a horizontal plane, we can assume that this is the floor and move the walls bottom to this extended plane. If ARKit detects a vertical plane, we can try to rotate one of the close walls to match, assuming the vertical plane is a wall, or a part of a wall. In both cases, we probably have to consider the bigger the plane the more relevant it is, as likelihood of a plane being part of a floor or a wall raises .

That is all good, but this doesn’t consider existing degree of error in the world tracking yet. To take that into account, we probably not only need to do the translation and the rotation of the world origin (or the whole room), instead individually move points (since the scene in itself needs most probably some local stretching). If we match a vertical or a horizontal plane which ARKit delivers us with a high relevance, we could bend the measured values to fit the tracked scene, to full fill the illusion that our virtual content is part of the real world.