Building blocks for augmented vision

Published in

UX Planet

3 min readSep 3, 2018

Most early and popular ideas around AR focus on augmenting human vision and technology was developed to support these ideas. The camera plays the main role in this type of Augmented Reality (AR). A camera paired with a computer (smartphone) uses computer vision(CV)to scan its surroundings and content is superimposed on the camera view. A large number of modern AR applications readily use the smartphone’s camera to show 3D objects in real space without having to use special markers. This method is sometimes called marker-less AR. However, this was not always the standard. There are a number of techniques used to augment content on the camera view.

Fiducial markers and images

Fiducial markers are black and white patterns often printed on a plane surface. The computer vision algorithm uses these markers to scan the image to place and scale the 3D object in the camera view accordingly. Earlier AR solutions regularly relied on fiducial markers. As an alternative, images too can be used instead of fiducial markers. Fiducial markers are the most accurate mechanisms for AR content creation and are regularly used in motion capture (MOCAP) in the film industry.

3D depth-sensing

With ‘You are the controller’ as it’s tagline, Microsoft’s Kinect was a revolutionary device for Augmented reality research. It is a 3D depth-sensing camera which recognizes and maps spatial data. 3D depth-sensing was available much before the Kinect, however, the Kinect made the technology a lot more accessible. It changed the way regular computers see and augment natural environments. Depth sensing cameras analyze and map spatial environments to place 3D objects in the camera view. A more mainstream depth-sensing camera in the recent times would-be iPhone X’s front camera.

Simultaneous localization and mapping (SLAM)

For a robot or a computer to be able to move through or augment an environment, it needs to map the environment and understand it’s location within it. Simultaneous localization and mapping (SLAM) is a technology that enables just that. It was originally built for robots to navigate complex terrains and is even used by Google’s self-driving car. As the name suggests, SLAM enables realtime mapping of the environment to generate a 3D map with the help of a camera and a few sensors. This 3D map can be used by the computer to place multimedia content in the environment.

Point cloud

3D depth-sensing cameras like Microsoft’s Kinect and Intel’s real sense, and SLAM generate a set of data points in space known as a point cloud. Point clouds are referenced by the computer to place content in 3D environments. Once mapped to an environment, they enable the system to remember where a 3D object is placed in an environment or even at a particular GPS location.

Machine learning + Normal camera

Earlier AR methods relied on a multitude of sensors in addition to the camera. Software libraries like OpenCV, Vuforia, ARCore, ARKit, MRKit have enabled AR on small computing devices like the smartphone with surprising accuracy. These libraries use machine learning algorithms to place 3D objects in the environment and require only a digital camera for input. The frugality of these algorithms in terms of sensor requirements have largely been responsible for the ensuing excitement around AR in recent times.

References:
1. Augmented Human — Helen Papagiannis
2. Practical Augmented Reality — Steve Aukstakalnis