We must distinguish between raw data and information, but such a distinction is quite a subtle business. ``Information'' refers to that part of a signal that constitutes useful information for some user.
Thus depending on the user, some part of a signal may be regarded as less useful. This means that there may be redundancy in the data. In some cases, the redundancy is unambiguous - e.g. in the easy case of simple repetition - where data is coded in some grossly inefficient manner.
Depending on the source, and on the form of the signal, we may know something about the statistics of the contents in advance, or we may have to do some sort of online analysis if we are to remove redundancy. The performance of online analysis will depend on the range and accuracy over which the signal repeats itself - in other words the blocksize.
How much data we can store in a compression algorithm that does onlien analysis will be affected by how much delay we are allowed to incur (over and above the delay `` budget'' for transmission and reception), and the CPU load incurred processign larger chunks of the signal.
Finally, redundancy is in the eye of the beholder - we are rarely obliged to keep the original signal with 100% integrity since human frailty would mean that even without an Internet path between light or sound source and a person, it is likely that the receiver would miss some parts of a signal in any case. This latter point is extremely task dependent.