Next: Moving Image Up: Still Image Previous: Still Image

## How Big Is a Single Frame of Video?

First we consider the spatial size of analogue video when compared to the common formats for digital video standards. A PAL television displays video as 625 lines and an NTSC television displays 525 lines. Current televisions have an aspect ratio of 4:3, giving PAL a spatial resolution of 833 x 625, and NTSC a resolution of 700 x 525, not all of which is visible. Most common formats for digital video are related to the visible area for each of the television standards. The size of video when using the international standard H.261, found in [#!h261!#] is 352 x 288 for the Common Image Format (CIF) format and 176 x 144 for the (Quarter CIF) QCIF format, and 704 x 576 for the (Super CIF) SCIF format, where a CIF image is a quarter the size of the visible area of a PAL image. For NTSC derived formats 640 x 480, 320 x 240, and 160 x 120 are common. Figure 4.12 shows the spatial size of these common resolutions with respect to a PAL TV image.

It can be seen that digital images are all smaller than current television sizes. Moreover, television images are significantly smaller than current workstation screen sizes which are commonly of the order 1200 x 1000 pixels. Digital video utilizes even less of a workstation screen.

Due to this significant size difference, some observers have commented that digital video often looks like "moving postage stamps", on modern workstations.

For digital video, as with analogue video, a new frame is required every 1/25th second for PAL and every 1/30th second for NTSC. If we assume that there are 24 bits per pixel in the digital video and 30 frames per second, the amount of disc space required for such a stream of full-motion video is shown in table 4.2. The table is presented for the amount of time the digital video is shown and for a given spatial size in pixels.

Table 4.2: The amount of data for full-motion digital video
 Time:Size 640x480 320x240 160x120 1sec 27Mb 6.75Mb 1.68Mb 1min 1.6Gb 400Mb 100Mb 1hour 97Gb 24Gb 6Gb 1000hours 97Tb 24Tb 6Tb

We can see that 1 hour of video with a resolution of 640 x 480 would consume 97 Gb of disc space, which is significantly larger than most storage devices. An equivalent amount of analogue video (i.e. a 1 hour video) , which has a higher resolution and also contains audio, would only take between a half and a quarter of a video cassette, for a 120 minute or a 240 minute cassette, respectively. However, although there are devices that can store this amount of data, there are currently no digital storage devices which could store 97 Gb on half a device which is the size of a video cassette. The data shown in the tables was collated by Larry Rowe of the Computer Science Division - EECS, University of California at Berkeley, for his work on The Continuous Media Player [#!rowe!#].

In order to reduce the amount of data used for digital video, it is common to use compression techniques, such as the international standards H.261, MPEG [#!mpegrtp!#], or to use proprietary techniques such as nv encoding [#!frederick!#] or CellB [#!cellb!#]. Rowe has also estimated the amount of space used when compression techniques are used. Table 4.3 shows the space needed when compressing video of size 640 x 480 pixels, and table 4.4 shows the space used when compressing video of size 320 x 240 pixels. Both tables present data for a given scale factor of compression and for the time the video is shown. The 97 Gb used for the 1 hour of 640 x 480 video can be reduced to approximately 1 Gb when compression is done at a scale factor of 100:1.

Table 4.3: The amount of data for compressed video of size 640x480
 Time v. Scale None 3:1 25:1 (JPEG) 100:1 (MPEG) 1 sec 27 Mb 9 Mb 1.1 Mb 270 Kb 1 min 1.6 Gb 540 Mb 65 Mb 16 Mb 1 hour 97 Gb 32 Gb 3.9 Gb 970 Mb

Table 4.4: The amount of data for compressed video of size 320x240
 Time v. Scale None 3:1 25:1 (JPEG) 100:1 (MPEG) 1 sec 6.75 Mb 2.25 Mb 270 Kb 68 Kb 1 min 400 Mb 133 Mb 16 Mb 4 Mb 1 hour 24 Gb 8 Gb 1 Gb 240 Mb

Although the table shows compression factors for MPEG, the H.261 standard uses a Discrete Cosine Transform encoding function which is similar to that used in MPEG, therefore we can expect the compression ratios to be of a similar order of magnitude. In reality, when encoding real video the compression factor is not constant but variable because the amount of data produced by the encoder is a function of motion. However, these figures do give a reasonable estimation of what can be achieved.

It is significant that with digital video it is possible to dramatically reduce the amount of data generated even further by reducing the perceived frame rate of the video from 30 frames a second down to 15 or even 2 frames a second. This can be achieved by explicitly limiting the number of frames or through a bandwidth limitation mechanism. In many multicast conferences the bandwidth used is between 15 and 64 Kbps. Although the reduced frame rate video loses the quality of full-motion video, it is perfectly adequate for many situations, particularly in multimedia conferencing.

There are a large number of still image formats and compression schemes in use in the network today. Common schemes include:

TIFF and GIF
These both use compression schemes based o nthe Lempel-Ziv type of algorithms described earlier.
JPEG
This is from the Joint Photographic Experts Group in the International Organisation for Standardization (ISO).

The first two of these still image schemes are discussed elsewhere in great detail. JPEG is interesting as it is also the same baseline technology as is used partly in several populat moving image compression schemes. The JPEG standard`s goal has been to develop a method for continuous-tone image compression for both color and greyscale images. The standard define four modes:

• Sequential In this mode each image is encoded in a single left-to-right, top-to-bottom scan. This mode is the simplest and most implemented one in both hardware and software implementation.
• Progressive In this mode the image is encoded in multiple scans. This is helpful for applications in which transmission time is too long and the viewer prefers to watch the image building in multiple coarse-to-clear passes.
• Lossless The image here is encoded to guarantee exact recovery of every source image sample value. This is important to applications where any small loss of image data is significant. Some medical applications do need that mode.
• Hierarchical Here the image is encoded at multiple resolutions, so that low-resolution versions may be decoded without having to decode the higher resolution versions. This mode is beneficial when transmission over packet switched networks. Only the data significant for a certain resolution determined by the application can be transmitted, thus allowing more applications to share the same network resources. In real time transmission cases (e.g. an image pulled out of an information server and synchronized with a real-time video clip), a congested network can start dropping packets containing the highest resolution data resulting in a degraded quality of the image instead of delay.

JPEG uses the Discrete Cosine Transform to compress spatial redundancy within an image in all of its modes apart from the lossless one where a predictive method issued instead.

As JPEG was essentially designed for the compression of still images, it makes no use of temporal redundancy which is a very important element in most video compression schemes. Thus, despite the availability of real-time JPEG video compression hardware, its use will be quite limit due to its poor video quality.

Next: Moving Image Up: Still Image Previous: Still Image
Jon CROWCROFT
1998-12-03