Spatial Tags: Low-Cost Spatial Indexing using Machine Vision

Introduction

Tags in the form of 2D barcodes have become commonplace in recent years. They are often used to encode a URL; for example, a movie poster might include a tag encoding the movie's web site address.

A lesser-known but potentially very useful feature of these tags is that, given an image of the tag, a user's device can work out its position and orientation in 3D space (relative to where the image was taken from). This is possible because there are enough corner points on a tag to satisfy the positioning equations.

Our current research investigates how tags might be useful in "smart environments", where computing is pervasive. A major enabler of such environments is the modern mobile phone, which is equipped with both considerable computing power and a variety of sensors (microphone, camera, Bluetooth, GPS, etc.).

Spatial tags

A powerful feature of smart environments is the association of particular behaviour with certain regions of space. We have developed and investigated a tag design that represents a zone in 3D space, or spatial zone, around the tag. When this zone is physically entered or left, an action may be triggered. We call these tags spatial tags.

The system consists of up to three kinds of entity:

  • Spatial tags are tags which: (i) represent a spatial zone; (ii) encode context about the action associated with the zone; and (iii) allow a user to locate themselves relative to the tag.

  • Clients are devices equipped with an outward-facing camera, either mobile (carried by a user) or fixed. They capture images, locate and decode spatial tags within those images, and perform actions based on whether they are inside or outside the located tags' zones.

  • Service providers are machines from which clients, as part of performing their actions, may request services. For some applications, the clients may be their own service providers.

A spatial tag is composed of two parts: spatial and symbolic.

  • The spatial component represents a zone using a 2D shape surrounded by a square border. We generate the corresponding 3D zone by extruding the 2D shape vertically, relative to the coordinate system of the tag. Because of the simplicity of this mapping between 2-D representation and 3-D zone, this component of the tag is easily user-editable.

  • The symbolic component comprises a grid of bits encoding context about the action clients should perform when entering or leaving the zone. We use the Cantag machine vision framework developed at the Computer Laboratory to decode this component of the tag.

Once the user's device has located a spatial tag, a point in the image is sampled to determine whether the user is in the zone, as shown in the figure to the right. This sampling point is determined by first projecting the camera's location into the plane of the tag, and then scaling by a pre-defined factor in order to relate real-world units to units of tag width. If the sampling point is found to lie within the spatial component and be a black pixel, then the user is within the zone; otherwise, he is outside it.

Applications

We are using spatial tags to provide a "follow-me desktop" service, wherein a user's desktop is automatically teleported to the computer nearest him. A spatial tag is attached to every participating computer's monitor: the tag's spatial component represents a zone in front of the monitor, and the symbolic component encodes the IP address of the PC to which the monitor is attached.

When notified over the network, a process on each PC (the service provider) will log in remotely to any other machine using the VNC desktop teleportation protocol (using the client's security credentials or prompting for authentication if required).

Publications

Low-Cost Spatial Indexing Using Machine Vision

Tom Craig and Joseph Newman and Andrew Rice
To appear in IEEE Pervasive 2009 (conference web site here)

Abstract: Location data is vital for many pervasive computing systems. Printed visual tags are commonly used to enable a machine vision system to determine the position and orientation of an observer. We have extended these tags to represent a general spatial zone which can be used to trigger an action upon ingress or egress. Our system is decentralised and computationally efficient, and it is trivial to create and modify zones. We use these tags in a “follow-me desktop” application.

Attachments