The fundamental standard upon which all videoconferencing applications are based is G.711 , which defines Pulse Code Modulation(PCM). In PCM, a sample representing the instantaneous amplitude of the input waveform is taken regularly, the recommended rate being 8000 samples/s (50 ppm). At this sampling rate frequencies up to 3400-4000Hz are encodable. Empirically, this has been demonstrated to be adequate for voice communication, and, indeed, even seems to provide a music quality acceptable in the noisy environment around computers (or perhaps my hearing is failing). The samples taken are assigned one of 212 values, the range being necessary in order to minimize signal-to-noise ratio (SNR) at low volumes. These samples are then stored in 8 bits using a logarithmic encoding according to either of two laws (A-law and =-law). In telecommunications, A-law encoding tends to be more widely used in Europe, whilst =-law predominates in the US However, since most workstations originate outside Europe, the sound chips within them tend to obey =-law. In either case, the reason that a logarithmic compression technique is preferred to a linear one is that it more readily represents the way humans perceive audio. We are more sensitive to small changes at low volume than the same changes at high volume; consequently, lower volumes are represented with greater accuracy than high volumes.
Adaptive Differential Pulse Code Modulation ADPCM (G.721) allows for the compression of PCM encoded input whose power varies with time. Feedback of a reconstructed version of the input signal is subtracted from the actual input signal, which is then quantised to give a 4 bit output value. This compression gives a 32 kbit/s output rate. This standard was recently extended in G.726 , which replaces both G.721 and G.723 , to allow conversion between 64 kbit/s PCM and 40, 32, 24, or 16 kbit/s channels. G.727 is an extension of G.726 and issued for embedded ADPCM on 40, 32, 24, or 16 kbit/s channels, with the specific intention of being used in packetised speech systems utilizing the Packetized Voice Protocol (PVP), defined in G.764.
The encoding of higher quality speech (50Hz-7kHz) is covered in G.722 and G.725 , and is achieved by utilizing sub-band ADPCM coding on two frequency sub-bands; the output rate is 64 kbit/s.
LPC (Linear Predictive Coding) is used to compress audio at 16 Kbit/s and below. In this method the encoder fits speech to a simple, analytic model of the vocal tract. Only the parameters describing the best-fit model is transmitted to the decoder. An LPC decoder uses those parameters to generate synthetic speech that is usually very similar to the original. The result is intelligible but machine-sound like talking.
CELP (Code Excited Linear Predictor) is quite similar to LPC. CELP encoder does the same LPC modeling but then computes the errors between the original speech and the synthetic model and transmits both model parameters and a very compressed representation of the errors. The compressed representation is an index into an excitation vector (which can be considered like a ``code book'' shared between encoders and decoders. The result of CELP is a much higher quality speech at low data rate.
High quality audio compression is supported by MPEG. MPEG I defines sample rates of 48 KHz, 44.1 KHz and 32 KHz. MPEG II adds three other frequencies , 16 KHz, 22,05 and 24 KHz. MPEG I allows for two audio channels where as MPEG II allows five audio channels plus an additional low frequency enhancement channel.
MPEG defines three compression levels that is Audio Layer I, II and III. Layer I is the simplest, a sub-band coder with a psycho-acoustic model. Layer II adds more advanced bit allocation techniques and greater accuracy. Layer III adds a hybrid filterbank and non-uniform quantization. Layer I, II and III gives increasing quality/compression ratios with increasing complexity and demands on processing power.