Problems Inherent in Monitoring Computing Systems

There is a tradeoff, using an analogy drawn from quantum mechanics, of an heisenbergian nature, between Load and Timeliness/accuracy: Basically, the more you monitor, the more it costs in gathering, loading the network or I/O systems and storage, and the more processing you gave to do to make a choice. There is also more and more interference in your systems, so that the information loses accuracy as it becomes more up to date as the system's load increases because of the extraction of the load information. The less you monitor, the less timely your information. All this is circumvented by providing statistical metrics rather than details of system activity all the time - typically, run queue information and average I/O over various sample periods are sufficient for distinguishing trends and basic differences.