For a variety of reasons, samples may arrive at different times than they were sent - this variation can be coped with so long as the mean rate of delivery of samples is maintained, and the variation is a second order (less important) effect compared with the delivery delay. A playout buffer can accommodate a fixed variation in arrival. If the arrival rate itself varies, it is possible to use an Adaptive Playout buffer, which is continually re-calculated.
As explained above, the Internet typically currently provides no guarantees. The throughput and delay along a path can vary quite drastically as other traffic comes and goes. When the network is overloaded, packets get lost leaving gaps in the information flow at a receiver. This is illustrated below. Two basic techniques have emerged to deal with these two problems:
The way these two techniques work is quite ingenious, but once seen, relatively simple [Jacobson, 94]. All sources that are generating information with time structure use a protocol called RTP, the Real Time Protocol, which places a media timestamp in each packet that is sent. All receivers use this time stamp for two purposes:
The inter-arrival time distribution is monitored. If the delay on the path varies, it will probably vary fairly smoothly, with some sort of reasonable probability distribution. By monitoring the mean difference between interarrival times, and adding this to a playout buffer that is used to delay sending thing between the receiving application and the output device (video window, audio, whiteboard , etc.), the receiver can be assured to a high degree of chance that it wont be starved of data (run out of steam). See figures 5.1 and 5.2 and
Figure 5.1 shows the components of a playout buffer. These include the mixing of streams from multiple sources, which can also be used to synchronise media. In figure 5.2, we can see the reason for this requirement graphically displayed: interference from other traffic causes jitter within the network.
Receivers monitor gaps in the inter-arrival times (that correspond to missing data, as opposed just to, say, silence in an audio stream). Periodically, Mbone applications report the statistics about particular sources by multicasting a report to the same group. A sender/source can use this report to calculate whether the network appears congested or not. The scheme used to adjust the sending rate is basically that used in TCP, but must be implemented by degrading the quality of the input media for audio and video - many video compression schemes are easily altered to permit this. The total amount of traffic generated by these quality reports is constrained to be a constant percentage of any conference session (typically around 5 conditions is that as a conference grows, the number of samples of different parts of the net gets better and better, hence the quality of information in fact improves, even though the quantity from any given receiver decreases. The receivers use the reception of other receivers reports to give an estimate of the number of receivers, and hence to reduce the frequency with which they send reports in a fully distributed way.