Next: Analog and Digital Up: Introduction - A Brief Previous: Content and Delivery

From Letters and Numbers to Sound and Vision

Throughout the 1960s, 1970s 1980s and 1990s, Computers have been restricted to dealing with two main types of data - words and numbers, text and arithmetic processing, through Word Processing and Spreadsheets etc. etc. Codes for numbers (binary, BCD, Fixed point etc., IEEE floating point), are fairly well standardized. Codes for text (ASCII, EBCDIC, but also fonts, Kangi, ppt etc.) are also reasonably well understood. Higher level ``codes'' - links, indexes, references, and so on are the subject of such standards as the ubiquitous Hyper-Text Markup Language, HTML.

Now computers, disks and networks are fast enough to process, store and transmit audio and video and computer generated visualization material as well as text and graphics and data: hence the multimedia revolution

One thing about multimedia that cannot be overstated: It is big, like space in the Hitchhiker's Guide to the Universe, it is much bigger than you can imagine. Of course, we are not talking about the hype here, we are talking about the storage transmission and processing requirements!

To paraphrase Maurice Zapp, from David Lodge's novel, A Small World: ``Every Encoding is a Decoding''. The idea behind this glib quote is that each time we invent a new way of representing and transmitting information, we also have to teach ourselves to receive and comprehend that new type of representation. In the rest of this section, we take a look at some aspects of representation that need to be understood in regard to multimedia.

Numbers and letters have standard encodings: ASCII and IEEE Floating Point are the most widespread now (at least for common English language text processing, and for numeric programming) - in the past there have been a plethora of other encodings, even for simple Roman alphabet text. As multi-lingual support has become common, we have seeing a brief increase in the number of encodings, and then as the problems become better understood, a standard set of character sets are emerging. Digital Multimedia Encodings in the form of audio and video are still at a very early stage in terms of standards, and there are many, partly because of the range of possible processing, storage and transmission performance capacities available on computers and networks, where some systems are right at the limits of their abilities to do any useful work at all!

Each new medium needs to be coded and we need to have common representations for objects in the medium; there are many choices. For example, speech can be coded as a sequence of samples, a sequence of phonemes, a string of text with a voice synthesizer setting, and so on, requiring more or less intelligence or processing at the sender and receiver, and providing more or less structural information (and as a result, typically allowing more compression). Similarly, video can be coded as a sequence of bitmaps, or else can be broken down into some description of scenes, objects within scenes, motion of objects and so on.

The codings now involve possible relationships with time and between different media. When we read a block of text, it is usually up to the reader to choose how quickly to read it. Hypertext to some extent breaks this rule, at least by relating text non linearly with other text. When we listen to speech, or have a conversation with another autonomous being, we do not control the rate of arrival of information so obviously. When we combine media, sound and vision, for example, we typically expect the combined media on a recording (or seen remotely) to maintain the temporal relationship that they had at source. This is what really defines data as being multimedia. Hypermedia is multimedia that is arranged with non-linear relations between sub-sequences.

Compression, and Hierarchical encoding are also needed. Multimedia data is typically much more bulky than text or numeric data. A typical simple-minded sampled audio sequence might take 8K bytes per second. This compares badly with 8K bytes of text: Assume we had 10 characters per word, then this would constitute 800 words, and might take a quick speaker something like a minute to read aloud. In other words, the speech requires at least two orders of magnitude more bytes than text. Video is far worse still, although clearly, comparisons are more difficult, since the value of typical information content is quite different.

All of this means that we need to consider compression techniques, to save storage and transmission capacity. Luckily, much audio and video is redundant (contains effectively repeated or less useful data) and is often far more amenable to compression than text.

Meta-languages (codes for codings) are required. Typically, while we are still evolving a wide range of codings and compression techniques, we need protocols for exchange of media between different systems. We also need protocols to relate the different media (for synchronisation and for hypermedia).

Next, lets look at some audio and video input forms and digital encodings.

Next: Analog and Digital Up: Introduction - A Brief Previous: Content and Delivery

Jon CROWCROFT
1998-12-03