A scene description used by a robot may include much information that is not relevant to the task currently being performed. This is especially likely when the robot has acquired the scene information by sensory means, whether visual, tactile, or in the form of range data. The irrelevant information is usually superficial detail, but even overall shape may be irrelevant to the task.
One of the most important functions of robot sensing systems is the fact that they filter raw sensory data, and provide a description of the object or scene sensed at a higher level of abstraction (for example, they may take an array of pixel brilliance values, and produce from it a description of object edges).
Some form of filtering is always necessary, but the requirements of the resulting filtered description can vary widely. A system which sorts nuts from bolts could abstract its sensory information to the degree that it describes any object as either nut, bolt, or not-nut-or-bolt. A system that joined nuts and bolts together, however, might need to know what head type the bolt had, what diameter it was, and what thread standard it conformed to. A system that manufactured nuts and bolts would need to know the depth of the thread, the thread spacing, the groove angle, and so on. A quality inspection system might need to know the exact position of every point along the thread.
A general purpose reasoning system, if it were to imitate human capabilities, should be able to separate these levels of detail, so that they are all available to it when necessary, but so that suitable abstractions can be used without considering what lies beneath them. This approach can provide more reasoning power than an arbitrary level of abstraction, because it is not always possible for a sensing system to select an abstraction without an understanding of the task - particular shape details may or may not be important in the context of a given task.
As an example, the irregularities in the surface of a rough cast fireplace grate are completely irrelevant to the function of the grate, but the general roughness of the casting surface may be quite important to a robot which is assembling fireplaces, because the roughness of the grate prevents it from sliding over other surfaces. In this case any given irregularity on the grate is still of no interest to the robot - it is texture that is important.
A contrasting situation is that of a large mechanical part, which is to be oriented in an assembly with the aid of a locating pin. The shape of the part as a whole might be irrelevant to the orientation task, while the shape of the small locating pin is very important. The contrast between this example and that of the rough casting makes it clear that the significance of a given detail in an object's shape depends on the functionality of that shape detail, and on what use a robot is trying make of the object.
A shape description could reflect this dependence on the functionality of shape elements by including in the representation only those elements of the shape which are functionally important. This is the de facto methodology of current robot programming, in which a programmer must decide whether or not a given aspect of the object's shape will ever be available to the robot program. These decisions are encoded in the program when the object description is being constructed by the programmer (I refer here to task-level programming languages such as RAPT, where objects are explicitly described, rather than robot-level languages, in which the shape of the object is only implicit).
A more intelligent robot, acting on the basis of representations that have been acquired from sensory data, will not necessarily have the information available which is needed to make decisions about the function of parts at the time that the representation of an object is constructed. Such a robot cannot ignore small details of object shape, because they may become important during operations on the object. On the other hand, the inclusion of every detail in a complex shape description may hinder the construction of useful abstractions for the robot reasoning system to work with.
As an example of the need to judiciously ignore detail, consider a sixpack of beer bottles. A detailed description of its shape appears to be very complex, and could obscure properties such as the fact that sixpacks can be regarded as rectangular blocks for the purposes of stacking them. A person wondering whether sixpacks could be stacked would not stop to analyse the shape of each bottle top. They would notice the overall box-like shape of each sixpack, and experiment with stacking them on the basis of this coarse description. A robot which is reasoning about its workspace in a qualitative fashion should be able to perform such a task with a similar economy of detailed analysis.
A way of achieving both simple overall shape description and retention of details which may become functionally important is to represent objects at multiple levels of detail. This approach has been used in computer systems that must perform high level analysis from possibly noisy input data, in domains such as reading handwritten script [SB87], or recognising speech [dM87]. The explicit representation of multiple levels in these systems allows them to continue referring to low level sensory information even after high level analysis has commenced.
Most machine vision problems are amenable to exactly this approach - they have a large amount of input data (typically the brightness of every pixel in the vision field), and they filter it to provide a higher level abstraction. A multilevel shape representation might involve making links from high level description to the low-level data. People, however, do not store large amounts of information that is later filtered - they store a coarse description as a ``first impression'', and collect more detail if necessary by focussing their attention [Pen86a]. Some attempts have been made to provide machine vision systems with similar controlled focus facilities, as described in [Fun80] [Pen87].
There are two options, therefore, in providing a vision system with the ability to create scene descriptions containing multiple levels of detail. Whichever is used, the reasoning processes described above can be supported - the difference lies in the visual control mechanism, and in the data storage techniques used. This section has argued that it is both useful and plausible for robots to represent shape data at multiple levels of detail. The shape representations discussed later in this chapter support qualitative methods which can make use of this type of sensory data, and perform this type of reasoning.