Cloud Storage for Firebase

Get Started on iOS  |  Firebase

Get Started on iOS  |  Firebase

Cloud Storage for Firebase lets you upload and share user generated content, such as images and video, which allows you to build rich media content into your apps. Your data is stored in a Google Cloud Storage bucket, an exabyte scale object storage solution with high availability and global redundancy. Cloud Storage lets you securely upload these files directly from mobile devices and web browsers, handling spotty networks with ease.


Strategy for “Resume Session”

Each time an AR app is opened in ARKit, it sets an arbitrary origin point (0, 0, 0) at the device’s current position and places all content in the scene relative to that origin point. This is because, although ARKit detects visual features in the environment to track the device’s motion, it has no ability to “remember” these features.

We do start the AR session with worldAlignment set to .gravityAndHeading, this option fixes the directions of the three coordinate axes to real-world directions, the location of the coordinate system’s origin is still relative to the device, matching the device’s position as of when the session configuration is first run.

At the time a room is scanned, measured and saved in the ARchi App, it happens relative to this arbitrary world origin in the AR session, which solely depends on the position of the device at the time the AR session has been started. Thus, the position of values measured in a previous session will never match a next session, since the next session will obviously always have another world origin, as the device position will never match exactly.

Apps like Pokemon GO did this by using GPS to geolocate Pokemon near specific real world landmarks like shops and parks. This kind of geolocation is sufficient for rough positioning but since GPS is so inaccurate, the positions of content can be between 5 to 20 meters off target every time you open the app. Further, GPS doesn’t work indoors which is where most “interior” design happens.

Other solutions on the market take a different approach to solve this problem. Rather than GPS, they uses a camera to visually scan a physical space and turn it into a map for future retrieval. It also gives a camera the ability to position itself in a map by comparing its current image with a pre-created map. This is simultaneous localisation and mapping (SLAM) technology, which can build a model simulating the real environment through the background process based on panoramic 3D reconstruction. Some refer this to provide a Visual Positioning System (VPS) to overcome the limitations of GPS for AR applications.

Back to ARKit: To create a correspondence between real and virtual spaces, ARKit uses a technique called visual-inertial odometry. This process combines information from the iOS device’s motion sensing hardware with computer vision analysis of the scene visible to the device’s camera. ARKit recognizes notable features in the scene image, tracks differences in the positions of those features across video frames, and compares that information with motion sensing data. The result is a high-precision model of the device’s position and motion.

World tracking is an inexact science. This process can often produce impressive accuracy, leading to realistic AR experiences. However, it relies on details of the device’s physical environment that are not always consistent or are difficult to measure in real time without some degree of error.

If we would like to resume a session, we have to move the world origin of the AR session to the location, where it has been during the scanning of the room, or we move the measured points.

We will allow manual movement relative to one of the corner points of the walls, in which case we will move the world origin by the vector of the movement of corner point. We also have to allow a rotation, even though we start the session with the gravityAndHeading, this is still not accurate enough, at least not accurate enough for our purposes, experiments show that it is  always necessary to rotate too. Given the fact that this has to happen every time you resume a session, it is questionable weather manual positioning is reasonable. Therefore, in order to make this at least somehow usable – this has to be fast and very easy from a UX perspective.

Bottom line is, we really need to thrive for an automatic or at least halve automatic process in order to make this  user friendly and user lovable.

ARKit is able – and its core competency – to detect horizontal and vertical planes. Each time a plane is detected, we should be able to utilize this information for automatic positioning of walls. If ARKit detect a horizontal plane, we can assume that this is the floor and move the walls bottom to this extended plane. If ARKit detects a vertical plane, we can try to rotate one of the close walls to match, assuming the vertical plane is a wall, or a part of a wall. In both cases, we probably have to consider the bigger the plane the more relevant it is, as likelihood of a plane being part of a floor or a wall raises .

That is all good, but this doesn’t consider existing degree of error in the world tracking yet. To take that into account, we probably not only need to do the translation and the rotation of the world origin (or the whole room), instead individually move points (since the scene in itself needs most probably some local stretching). If we match a vertical or a horizontal plane which ARKit delivers us with a high relevance, we could bend the measured values to fit the tracked scene, to full fill the illusion that our virtual content is part of the real world.


Interesting article: Real-Time Location-Based Rendering of Urban Underground Pipelines, with quote:

The inevitable direction of future development is simultaneous localization and mapping (SLAM) [20] technology, which can build a model simulating the real environment through the background process based on panoramic 3D reconstruction. The model is then rendered to a display device after scene-graph fusion of the virtual world and the real world so as to achieve an interaction of the two worlds. Localization and mapping are the key functions of SLAM, which can be divided into two separate threads [21–23]. Motion tracking based on MonoSLAM [24] has problems such as extensive calculation, long work time, existing scale ambiguity [21], and difficulty detecting feature points when the device is moving fast. The integration of inertial measurement unit IMU [25] to get six degrees of freedom (6DOF) [26] of the device plays a complementary role in improving its refresh rate and accuracy. For example, Google Tango [27] and the updated ARKit in iOS 11, released in June 2017, utilize the Visual-Inertial Odometry (VIO) [28] method, which combines vision and inertia to gain 6DOF of the mobile device. The difference is that Google Tango achieves 3D reconstruction with hardware, such as a depth camera and a fisheye camera, while optimization based on the ARKit’s algorithms allows most iOS devices to have AR capabilities. Without depth perception, ARKit currently can only detect planes, which means it cannot achieve reconstruction of the environment, like Tango, or complete a complex environment interaction.

Side note:

On 15 December 2017, Google announced that they will be turning down support for Tango on March 1, 2018, in favor of ARCore.[7]