KHARMA Framework

Browser Overview

An Open Platform for Delivering Mobile Augmented Reality Experiences

The KHARMA architecture is a new open platform for augmented reality that lets users create content using HTML and JavaScript web development tools already in widespread use today. In contrast to the proprietary AR browser solutions currently available, this approach allows almost any web-based technology to be deployed in any combination into the surrounding scene, resulting in far richer AR experiences.

This new open platform also allows content developers to create and host content using an extended version of the GoogleEarth markup language on standard HTTP servers. The platform addresses several practical problems related to mobile AR development and delivery. It creates an architecture for aggregation of and interoperability between content developed and delivered by multiple sources. It also addresses the unique role of existing infrastructure as integral to the authoring pipeline both offline and at runtime. And, perhaps most importantly, the platform compensates for both the uncertainties of current tracking technologies and the future widespread use of vision-based tracking through the use of static background scenes and the delivery of pre-processed tracking keys respectively. The platform is built upon several unique contributions to mobile AR experience delivery: channel servers, infrastructure servers, geospot servers and an open source standards-based mobile client.

Channel Servers

Typical AR mobile browsers (Layar, etc.) display a single channel of information at any one time to the user. This is a reflection of the lack of formalized APIs for sharing information or content between these channels of information. However, such exchanges of information between different sources are commonplace on the Internet as we know it (i.e. between Facebook, Google and Twitter). The Argon mobile AR client allows the user to open multiple channels of AR content, each adding visual content to the scene and potentially interacting with one another using standard web protocols already in use today. Each of these channels is a URL through which content is delivered in KARML, an extended set of KML, using standard web server protocols and technology. The KARML markup contains within it a mixture of 3D COLLADA models and HTML content along with the position and orientation at which to place each piece of content.

Any mobile AR client browser using this platform can be thought of as a GoogleEarth browser for the phone that displays neither buildings nor landscape but does display individual COLLADA models and rich HTML content within the scene. While the GoogleEarth application does allow HTML content within popup balloons, it only supports a limited set of HTML functionality, a limited amount of control over balloon styling and prevents the user from accurately controlling the placement of those balloons within the scene. The content development pipeline for AR channels uses a combination of GoogleEarth, Google Sketchup and a modern web browser. Features such as fixed Placemarks in the world can be created in GoogleEarth and then exported in the form of a KML file. Almost any content developed and tested in a modern web browser can then be added within feature descriptions in this KML file. The KARML extensions can then be used to exercise even greater control over how the HTML content is displayed within the scene. As with the desktop version of GoogleEarth, 3D models in the COLLADA format constructed in Sketchup or downloaded from multiple sources can be positioned in the scene and exported as part of the KML file (the Argon client does not support this feature yet).

Current mobile AR content development requires the content developer to conform to a limited set of content and interactions provided by the provider of the AR browser being targeted. A single AR browser provider using a proprietary format then delivers the resulting AR content to the client. In the proposed architecture, individual HTTP protocol channel servers deliver either static KARML files (using the kml file extension) or a stream of KARML content through repeated client polling with the GPS location of the mobile browser. As with the current KML standard, new KARML features can be added to the channel as the user location changes and KARML updates can be delivered to the browser to change the properties of existing features (i.e. locations of vehicles, people, etc.).

Infrastructure Servers

High quality AR applications make use of knowledge about the surrounding structures to accurately occlude content that should be hidden by them. For example, one may not wish to see content superimposed onto the Old Navy store that is associated with the Staples behind it. The infrastructure surrounding the user can also be valuable for user generated content and interaction. Determining the correct placement of an annotation or image on a building at runtime becomes much easier when the AR browser knows the shape and location of the building in view. One problem with current AR development pipelines is that any structures in the environment must be authored as part of the application at development time and delivered at runtime. This approach does not scale well with content from multiple sources because the physical structures in the environment are the one thing that all AR channels are certain to have in common. Another problem is that, even when nearby structures are authored into the application, the physical scene needs to be surveyed to determine where in GPS coordinates AR content should be positioned. When structures are not easily accessible (i.e. upper stories of buildings, billboards) or repeated in multiple locations (i.e. over the front door of every Staples, on every stop sign, on every bus), AR authors need a way to determine those coordinates without having to survey the site. Moreover, if buildings are remodeled, structures move or more accurate surveys become available AR scenes need a way to automatically compensate for those changes.

Our solution to these problems is to utilize 3D models from infrastructure servers both for content placement at development time and for runtime user interactions and occlusions within the browser. Each infrastructure source has a unique URI and structures along with the substructures within them can be accessed using standard XML tools such as XPath. At development time, authors can reference specific 3D buildings from a chosen source and position their content relative to the structure or substructures within (i.e. front door, windows, stairs, roof, etc.). Moreover, places of business can adopt a common schema to publish the preferred location of AR content (i.e. user whiteboard, advertisements, comments, etc.). The KARML extension adds the ability to specify feature coordinates (or meters) relative to other features (i.e. buildings). As a result, AR authors have a lot more flexibility in how they generate augmentations since the location of content relative to source models can be determined at runtime. It follows that this strategy allows for the development of AR content that is not tied to specific coordinates such as those that currently use vision-based markers (i.e. ARToolKit, etc.).

For more information on our Infrastructure services look here

GEOSpot Servers

One of the main drawbacks of current mobile augmented reality browsers is the inaccuracy of handheld GPS and orientation sensors. At best, this imprecision leads to “Swim AR” where content floats around on the screen over the objects it is supposed to be tied to. At worst, content actually behind the user may appear in front of the user because the GPS location indicated by the handset is off by many meters. Currently, mobile AR content is developed for GPS and phone orientation sensors that can vary widely depending on the number of visible satellites and nearby magnetic interference. Although all phone platforms report the current GPS accuracy, many AR browsers (Layer, Wikitude, etc.) only allow channel authors to alter the content at the time of content delivery or at infrequent intervals (i.e. 5 minute intervals).

Our solution to this problem is twofold. First, we allow the user to view a map of known geographic locations nearby (GEOSpots). These GEOSpots are surveyed points with accurate GPS coordinates and details about how to find them (i.e. text and/or photo). A user can find a nearby GEOSpot and indicate that they are standing on it. This greatly increases the location accuracy of the ensuing augmentation and creates an opportunity for the channel author to tailor the nature of the information being displayed to the changing accuracy of the device. The accuracy of the device is made available to content authors at runtime through the public JavaScript client API. Although all available content may be delivered, authors may choose to display only that portion of the features outside the current GPS accuracy range. When a user indicates that they are at a GEOSpot, this range of nearby visible items can be increased with confidence that those items will be rendered with a known accuracy. This realtime responsiveness to sensor accuracy provides an opportunity for the channel author to customize exactly how their content is presented to the user.

Our second strategy for dealing with sensor accuracy is to provide a synthesized backdrop at the GEOSpot location in the form of a photo or panoramic image. Currently most augmentations are associated with relatively static buildings and other physical structures rather than the moving people, variable vegetation and dynamic weather that give live video its character. When standing at a GeoSpot, the user can temporarily switch on a panoramic backdrop taken at the same location and view augmentations against it instead of live video. As the user changes the orientation of the device, the subsection of the backdrop displayed changes in rough accordance with what the video would show at that orientation. This approach allows the browser to register HTML and 3D content against the background image with significantly greater accuracy. This increased accuracy, is then reported back to the content author. Whereas an icon might have been appropriate before, a significant increase in accuracy may allow the author to create 3D augmentations that are only appropriate when highly accurate registration with the surrounding scene is available.

For more information on our GeoSpot services look here

Open Source Clients

Perhaps the most significant drawback with current mobile AR browsers is that there is no open standard for authors to develop for. The current frameworks only give the content developer a restricted template within which to develop against. And, the exact protocol delivered to the mobile browsers these frameworks support remains proprietary. The result is that content developers must patiently ask for any new features or innovations they want to add to their augmentations. Our position is that, as with desktop web browsers, providing an open source client for mobile AR will spur the development of enhancements to the current browser and/or alternate browsers with competing feature sets. To this end, we have created a reference implementation of the KHARMA framework for the iPhone called Argon. The Argon client is for the iPhone only at this time, but we are carefully avoiding the use of libraries and conventions that cannot be easily ported to other platforms.

There are many issues to be solved related to the presentation and management of AR content. One issue involves the visual overlapping of content within the same channel. Different AR channels may want to handle such collisions in a different manner. Current AR browsers give developers very limited tools (i.e. a range slider) for customizing the user interface and the overall functionality of the browser. Our approach to this problem favors moving the majority of functionality related to the content presentation layer within JavaScript functions that can be overridden by users. This way, even without changing the underlying source of the browser, alternate strategies for managing the presentation of content can be explored with minimal effort.

For more information on our Argon client look here