Implementing POSIX clocks under Linux

Markus Kuhn

Note: This is an old and early draft document from 1998. It evolved into proposed modernised <time.h> for ISO C 9X, which implements similar ideas and tries to bring better leap second and timezone handling into the ISO C standard. Later I came to the conclusion that attempting to present inserted leap seconds as an out-of-scale timestamp of the form 23:59:60 to applications at an operating-system-API or network-protocol level is not very useful for most applications and that a standardized smoothed form of UTC, such as my UTC-SLS proposal (2006), is far more practical in almost all applications, except for those concerned with precisely tracking the motion of physical masses, where TAI is a more useful timebase.

The POSIX clock interface ignores the existence of leap seconds in the commonly used UTC time scale and does not provide a sufficiently powerful interface for adjusting clocks to an external time reference. However the POSIX clock interface was designed in an extensible way. We discuss here how the clock functionality of POSIX can be improved. Eventually we might submit the result as a new POSIX standard proposal (far future) and implement it in Linux (near future).

Participants in this discussion so far have been: Markus Kuhn, Joe Gwinn, Colin Plumb, Andrew Derrick Balsa, Paul Eggert, and probably others I forgot. If you are interested, please join the tz mailing list.

Before you read this document, you may find it useful to familiarize yourself with the following references:

POSIX.1 specifies its representation of time in the time_t type as follows: seconds since the Epoch: A value to be interpreted as the number of seconds between a specified time and the Epoch.

A Coordinated Universal Time name (specified in terms of seconds (tm_sec), minutes (tm_min), hours (tm_hour), days since January 1 of the year (tm_yday), and calendar year minus 1900 (tm_year) is related to a time represented as seconds since the Epoch, according to the expression below.

If the year < 1970 or the value is negative, the relationship is undefined. If the year >= 1970 and the value is non-negative, the value is related to a Coordinated Universal time name according to the expression:

tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86 400 + (tm_year-70)*31 536 000 + ((tm_year-69)/4*86 400

There are two problems with this encoding of UTC second names:

Proposed modifications to POSIX.1

Seconds since the Epoch definition

Replace the last paragraph in section with the following paragraph:

If the year < 1970 or the value is negative, the relationship is undefined. If the year >= 1970 and the value is non-negative, the value is related to a Coordinated Universal time name according to the expression:

tm_sec + tm_min * 60 + tm_hour * 3600 +
tm_yday * 86 400 + (tm_year-70) * 31 536 000 +
((tm_year-69)/4 - (tm_year-1)/100 + (tm_year+299)/400) * 86 400

The seconds since the Epoch should be represented as a numeric type that covers a value range sufficiently large to handle times from 1970 until the end of the year 9999.

Additional clocks

Change the third paragraph in section 14.1.1 into

The tv_nsec member is only valid if greater than or equal to zero, and less than the number of nanoseconds in a second (1000 million), unless a time within a leap second is represented by a clock type that does indicate leap seconds. The time interval described by this structure is (tv_sec × 109 + tv_nsec) nanoseconds. Clocks that represent leap seconds do so by keeping tv_sec at the value for the preceding second while adding the value 1000 000 000 to tv_nsec, that is a leap second is represented by a tv_nsec value in the range 1000 000 000 to 1999 999 999.

Section 14.1.4 should in the end contain descriptions of the following clocks:

This clock provides a best effort estimate of UTC in a way that is backwards compatible with existing practice. Very little is guaranteed for this clock. It will never show leap seconds. When CLOCK_UTC becomes available, then CLOCK_REALTIME should be adjusted to match CLOCK_UTC. For small phase adjustments of the clock (up to 10 minutes difference), the frequency (rate) of this clock will be increased or decreased by up to 1% until both clocks show identical times. For larger adjustments (which are only to be expected when the system is first installed), CLOCK_REALTIME jumps directly to CLOCK_UTC and the system administrator should be warned about this unusual event. After CLOCK_UTC has had a leap second, CLOCK_REALTIME will need at least 100 s until both clocks are phase synchronous again, because CLOCK_REALTIME has to follow the leap second phase shift by temporarily changing its frequency slightly. In BSD compatible systems, gettimeofday() shows the same time as CLOCK_REALTIME (truncated to microsecond resolution). CLOCK_REALTIME has a resolution of at least 20 ms (typically much better) and unspecified accuracy, frequency stability, and monotonicity.
This clock is only available when the system knows with high assurance Coordinated Universal Time (UTC) with an estimated accuracy of at least 1 s (typically much better). Whether UTC is known with high assurance depends usually on whether the system clock driver (e.g., Mill's kernel PLL) has recently received UTC reference signals from an external source (GPS, NTP, DCF77, WWV, etc.). Clock drivers are required to calculate an estimate of the accuracy of the current clock value, for instance using a Kalman filter that observes both the external and the internal reference oscillators and makes a time estimate by modelling the errors of both sources. The estimated accuracy decreases when the external clock signal becomes unavailable for a longer time, and CLOCK_UTC must be made unavailable when the estimated accuracy has become worse than some documented limit that is not higher than 1 s. CLOCK_UTC also becomes unavailable after a system disruption that could have affected the continuity of the internal clock (e.g., a Laptop recovering from a power saving mode with reduced clock frequency) until an accuracy estimate has been established again. During inserted leap seconds, the tv_nsec field will be in the range 1000000000 to 1999999999 in order to represent the leap second 23:59:60Z for which the POSIX time_t does not provide any legal value. CLOCK_UTC is the only clock described here that indicates leap seconds.
This clock is only available when the system knows International Atomic Time (TAI) with at least an accuracy of 1 s. The only difference between TAI and UTC is that TAI is never corrected by leap seconds, therefore TAI is a few whole seconds ahead of UTC (one second more after every UTC leap second). Some time broadcasting services such as GPS provide both TAI and UTC (e.g., by publishing a scale linked to TAI plus the difference to UTC). TAI is needed for instance to control processes (e.g., astronomical observations, navigation, etc.) where leap seconds are undesirable. CLOCK_TAI is handled very similarly to CLOCK_UTC in that it becomes unavailable when the clock filter algorithm estimates the accuracy of its output to be worse than 1 s.
This clock never jumps, it is guaranteed to be available all the time right after system startup, and its frequency never varies by more than 500 ppm. It is intended for systems that might not know UTC or TAI at boot time, but where a monotonically increasing constant rate clock is needed right from boot time for highly reliable time interval measurements. This clock's frequency might be adjusted in a PLL control loop once an external reference (NTP, GPS, etc.) has been available long enough to measure the ±500 ppm frequency error and instability of typical motherboard oscillators. No attempt is made to adjust the phase of clock monotonic. Its timestamps are guaranteed to be unique and monotonically increasing during the uptime of the operating system (but not necessarily across several reboots). CLOCK_MONOTONIC does of course not have leap seconds. CLOCK_MONOTONIC can be identical to CLOCK_TAI at boot time if TAI is available, but this is not guaranteed. CLOCK_MONOTONIC can also start its epoch at system startup or preferably CLOCK_MONOTONIC starts with the best available TAI (or UTC) estimate that is available.
This clock started its Epoch when the current thread was created and runs only when the current thread is running on the CPU. This is execution time, which progresses always slower that the wall clock times represented by the previous clocks.
This clock starts its Epoch when the current process was created and runs only when a thread of the current process is running on the CPU. This is execution time, which progresses always slower than the wall clock times.

Clock control system call

A new clock_control() function should be introduced into POSIX.1 to provide a standardized interface for programs such as xntpd that read external reference clock signals and want to pass them on to the clock driver, as well as for programs that want to get more information than CLOCK_UTC can provide, for instance accuracy estimates and leap second warnings. The functionality could be roughly along the lines of ntp_gettime() and ntp_adjtime() by Mills, but somewhat more generalized and less xntpd implementation specific.

... work in progress ...

created 1998-06-07 -- last modified 1998-09-09 --