Date: Fri, 12 Jun 1998 15:38:32 -0700
Message-Id: <199806122238.PAA15166@shade.twinsun.com>
From: Paul Eggert <eggert@twinsun.com>
To: tz@elsie.nci.nih.gov
Subject: comments on draft ISO C9x changes to <time.h>

The ISO committee in charge of the C language has issued a draft for
C9x, the next major revision to C.  A copy of this (large) document is
available in:

http://osiris.dkuug.dk/JTC1/SC22/open/2620/n2620/

Section 7.16 of this draft C standard proposes a major overhaul of the
functions and datatypes defined in <time.h>.  It adds a new data type
`struct tmx' that is struct tm extended with the following members:

	int tm_version;	// version number
	int tm_zone;	// time zone offset in minutes from UTC [-1439,+1439]
	int tm_leapsecs;// number of leap seconds applied
	void *tm_ext;	// extension block
	size_t tm_extlen; // size of extension block

Also, a struct tmx's tm_isdst is the positive number of minutes of
offset if DST is in effect.  New functions mkxtime, strfxtime use
struct tmx instead of struct tm; a new function

       struct tmx *zonetime (const time_t *timer, int zone);

is the rough analog of localtime and gmtime for struct tm.

I've submitted the following comments to the ISO committee for their
review.  A copy of these comments (along with all other US public
comments on Committee Draft 1) can be found in:
http://osiris.dkuug.dk/JTC1/SC22/WG14/www/docs/n834.htm


Category: Feature that should be removed
Committee Draft subsection: 7.16
Title: changes to <time.h> need a lot of work and should be withdrawn for now
Detailed description:   

  Background and comments

    Draft C9X introduced a new time struct tmx, new macros
    _NO_LEAP_SECONDS and _LOCALTIME, and new functions mkxtime,
    zonetime, and strfxtime.

    These new functions seem to be an invention of the committee;
    they are not based on existing practice, and in some cases
    even ignore longstanding existing practice.  The new functions
    do not address many of the common problems observed with the
    C89 primitives, notably with mktime.  Nor do they add much
    functionality.

    For example, a common extension to C, now required by POSIX.1, are
    reentrant versions of localtime, gmtime, etc.  This fills a
    genuine need, but it's not addressed by draft C9X.

    There are also other genuine needs that are not addressed; just
    look at, say, the harsh words about mktime expressed by the author
    of the tide-calculation program XTide in its source code
    <http://www.universe.digex.net/~dave/files/xtide-1.6.2.tar.gz>.
    Draft C9X addresses few of the needs expressed by this author.

    Here are some more detailed comments on technical shortcomings
    in this area.

      Section 7.16.1 paragraph 3.

	The tm_zone member is an integer number of minutes.  However,
	common practice (e.g. SunOS 4.x, BSD/OS, Linux) is to have a
	member named tm_gmtoff that is a long number of seconds.  This
	is required for proper support of POSIX.1, which lets the user
	specify UTC offset to the second; it is also required for
	proper support of historical applications.  For example, the
	UTC offset of Liberia was 44 minutes and 30 seconds until May
	1972, and any program running on, say, Linux with the TZ
	environment variable set to "Africa/Monrovia" cannot operate
	correctly with if the UTC offset is required to be a multiple
	of 60 seconds.

	The tm_ext and tm_extlen members are an unprecedented kludge
	in the standard library spec.  This is not C++!  If the
	specification for struct tmx is incomplete, this suggests that
	the editorial work is not done and this type should be
	withdrawn from the standard.

      Section 7.16.2.3 paragraph 4.

        Here, draft C9X added the following new specification for mktime:

	   If the call is successful, a second call to the mktime
	   function  with  the  resulting  struct tm value shall always
	   leave it unchanged and return the same value  as  the  first
	   call.  (*)

	This specification is reasonable for mkxtime, but for mktime
	it requires changes to existing practice in a way that breaks
	existing software.  Existing software often assumes that
	tm_isdst is either negative, 0, or 1; C89 does not guarantee
	this, but it is common existing practice, so software that
	makes this assumption is portable in practice.

	Unfortunately, specification (*) cannot be satisfied without
	either adding hidden members to struct tm (which breaks binary
	compatibility) or by stuffing more information into tm_isdst
	(which breaks the programs described above).

	Granted, programs shouldn't assume that a positive tm_isdst
	is 1, but it's very common in POSIX.1 programs to see
	expressions like `tzname[tm->tm_isdst]', and these expressions
	won't work if tm_isdst contains large values.

      Section 7.16.2.4 paragraph 3.

	If tm_zone was _LOCALTIME, and if tm_isdst is preposterous
	(e.g. negative, or INT_MAX), this specification is unclear
	about what to do.  The comments in 7.16.2.6 don't help much.

      Section 7.16.2.6 paragraph 1.

	The specification for tm_isdst does not allow for negative
	daylight-saving time.  I don't know of any historical practice
	for this, but POSIX.1 allows it, and implementations that
	support POSIX.1 have to allow for it.

      Section 7.16.2.6 paragraph 2.

 	The limits on ranges for struct tmx members are unreasonable.
	Common existing practice, for example, is to invoke mktime
	with a large value for tm_sec to compute a time stamp at some
	distance from the POSIX.1 epoch.  If int and long are the same
	size, this runs afoul of the new restriction in this section,
	which limits tm_sec to one-eighth of the potential range.
	With this limitation I cannot even use mktime to compute
	today's date on my Unix host from today's time_t value!

	The other limits are also unnecessary.  A well-written mktime
	should work in the presence of arbitrary values in struct
	tm members; similarly for mkxtime.

      Section 7.16.2.6 paragraph 3.

	There are so many errors in this section that it is hard to
	determine what is intended.  But from what I can tell, the
	intent is wrong.  For example, it seems to be saying that if
	the implementation supports leap seconds, and if local time is
	UTC, and if I have a struct tmx that corresponds to 1997-06-30
	00:00:00, and then add 1 to tm_mday and invoke mkxtime, I
	should get 1997-06-30 23:59:60 due to the intervening leap
	second.  This is not what I, the programmer, want or expect!

	The first sentence in this paragraph reads ``Values S and D
	shall be determined as follows''.  But the rules that follow
	do not _determine_ S and D; they merely place _constraints_
	on S and D.  This is because the implementation has some leeway
	in choosing X1 and X2.

	It's not clear in this paragraph whether we're looking at C
	code or mathematics.  Are we supposed to be using all the C
	rules for promotion, conversion, and overflow, or are the
	calculations to be done using mathematical integer arithmetic?

	The last sentence in the comment about X1 and X2 is
	incoherent; I really can't make out what it means.

	For the implementation to determine X1 and X2, it needs to
	know what D and S are.  But D and S are computed from X1 and
	X2!  More explanation is needed before I can really figure out
	what's intended here.

	The definition of D is completely unmotivated, and does not
	obey the rules of the Gregorian calendar.  Among other things,
	it uses / and % in places where it should use QUOT and REM.
	(And it can't possibly be right without a `100' in it
	somewhere.  :-) The definition should be rewritten to be
	something like the following.  (Sorry, I haven't tested this,
	as it's less than 30 minutes before the deadline for
	submitting comments in the US as this sentence is being
        written.)

	  D = // day offset since 0000-03-01

	      // contribution from year
	      Z*365 // number of non-leap days since 0000-03-01
	      + QUOT(Z, 4) // Every 4 years ends in a leap year.
	      - QUOT(Z, 100) // Every 100 years ends in a nonleap year.
	      + QUOT(Z, 400) // Every 400 years ends in a leap year.
	
	      // contribution from month; note we start from 03-01
	      + ((int []){ ...yday offsets, starting in March ...})
			[REM(M - 2, 12)]
	
	      // contribution from day of month
	      + tm_mday - 1

	      // contribution from time of day
	      + QUOT(SS, 86400)
	      
	except of course that the expression QUOT(SS, 86400) mishandles
	leap seconds as described above.

      Section 7.16.3.5

	This new function zonetime is if only marginal use; it seems to
	be present mostly as a way of defining how mkxtime works.

	The definition of leap seconds is incorrect.  Leap seconds are
	not a UTC-UT1 offset.  The absolute value of the difference
	between UTC and UT1 is at most 0.9 seconds, by definition.

    The changes to 7.16 seem to be hastily edited: there are a number
    of what seem to be typographical errors.  The changed text is not
    explained, and the typos make it hard to understand what was
    intended.  Here are some of the typos that I spotted despite these
    problems:

      Section 7.16.1 paragraph 2.  _LOCALTIME ``must be outside the
      range [-14400, +14400].''  Presumably this should be [-1440,
      +1440], i.e. one day's worth not ten.

      Section 7.16.2.6 paragraph 3.

	The definition for QUOT yields numerically incorrect results
	if (b)-(a) or (b)-(a)-1 overflows.  I suggest replacing it
	with the following definition, which is clearer and free of
	problems with overflow.  This definition relies on C9X's new
	guarantees about integer division.

	  #define QUOT(a,b) ((a)/(b)  -  ((a)%(b) < 0))

	Similarly, REM can overflow if (b)*QUOT(a,b) overflows.  Here
	is a better version.

	  #define REM(a,b) ((a)%(b)  +  (b) * ((a)%(b) < 0))

	The definition of Z can be written more compactly as:

	  Z = Y - (M < 2);

      Section 7.16.3.6 paragraph 5.

	``If this value is outside the normal range, the characters stored
	are unspecified.''  What is the ``normal range''?  The range as
	output by localtime, the range of the Gregorian calendar, or
	the limits as specified in 7.16.2.6?
	

  Suggestion

    Drop all changes to the <time.h> section for this revision of
    the C Standard.

    Bring in experts in this area for the next revision of the
    C Standard.  I suggest working together with the members of the
    Time Zone Mailing list <tz@elsie.nci.nih.gov>.

    Build on existing practice rather than relying on committee
    inventions, which have been error-prone in this area.

    If these suggestions is not followed, a lot of changes are
    needed to this section, as suggested by the above discussion;
    please contact me if you need more details.


Date: Mon, 15 Jun 1998 09:46:39 +0200
From: Antoine Leca <Antoine.Leca@renault.fr>
To: tz@elsie.nci.nih.gov
Subject: Re: comments on draft ISO C9x changes to <time.h>

Paul Eggert wrote:
> 
> The ISO committee in charge of the C language has issued a draft for
> C9x, the next major revision to C.  
> Section 7.16 of this draft C standard proposes a major overhaul of the
> functions and datatypes defined in <time.h>. 
> 
> I've submitted the following comments to the ISO committee for their
> review. 

I have submitted the following memo to the comittee, in order to
handle Paul's comment.  I shall appreciate any comment about this from
the people listening at this list, since it appears to me that they
are among the most informed persons on these matters.


I noticed in the archives the PC-US0011, from Paul Eggert,
and particularly the point #14 about the extensions
introduced by C9X to the time functions.

I intended to propose a change to the draft to solve these
issues.  Of course, I do not comment about the points of
detail regarding the wordings, but I try to stick at the
most basic problems.

I believe that if we want to go beyond the present (C90)
state of time functions, the following needs are to be
covered (in this order):

1) remove the dependancy to the internal static buffers

2) doing 1) in a way compatible with POSIX.1 (*_r functions)

3) having a way to specify the timezone (other than the local
and UTC ones) [when timezones are considered as UTC offsets]

4) doing 3), including when passing a DST shift, or a change
of rules, i.e. when timezones are considered as a portion of
the world

5) handling explicitely leapseconds


Notes:
There are three kinds of internal static buffers:
 - the buffers specified in the Standard which hold the return
values of asctime, ctime, gmtime and localtime
 - the buffer to hold informations about the timezone
 - the buffer to hold the locale informations for strftime

As we all know, the third kind is bound to the locale model,
so I shall not go further in this area.  Unfortunately, the
POSIX.1 *_r functions remove the dependency on only the 1st
kind.  So the removal of the dependency to the 2nd kind
will require inventing new stuff.


Then, I believe POSIX's ctime_r and asctime_r can be
written using the present standard library, with code like

char * asctime_r (const struct tm *timeptr, char *p )
{
char *old_loc;

  GET_MUTEX_LOCK(locale_lock);  /* if the locale is shared */
  old_loc = setlocale(LC_TIME, "C");
  strftime(p, 26, "%c\n", timeptr);
  setlocale(LC_TIME, old_loc);
  RELEASE_MUTEX_LOCK(locale_lock);
  return p;
}

(in part because the behavior of strftime is now better specified
in the "C" locale).  And if I am wrong, then I believe wordings
should be improved for this to be correct (I know the problem
about setlocale to *not* be called from inside the library, but
this can be solved by the implementor using an internal alias).


So, to solve localtime_r/gmtime_r needs and point #3,
I then propose two new functions (to be compared with
zonetime and mkxtime from C9X draft) and a new
structure [and perhaps another type].
Remarks around brackets are for you to comment on!

The structure is named struct tzinfo, and its first
field, named tz_gmtoff, is a long containing the offset
in seconds from UTC to a timezone, with positive values
meaning ahead of UTC.  The structure might contain other
[unspecified ? implementation-dependent ?] fields, for
example to specify DST rules, but they should be designed
such as when initialized to zero, the designated timezone
holds a constant offset with UTC.

The functions are:

struct tm *  zonetime (
  const time_t * timer,
  const struct tzinfo * tz,        /* if NULL, use local time */
  struct tm *    timeptr);         /* if NULL, use internal */

time_t timezone (
  struct tm *    timeptr,     
  const struct tzinfo * tz);       /* if NULL, use local time */


The meaning of these functions should be obvious when
I say that the usual functions can be expressed with them:

time_t mktime( struct tm* timeptr )
{  return timezone(timeptr, NULL); }


time_t timegm( struct tm* timeptr )
{
/* Here, I used the fact that tz_gmtoff is the first field
   of the structure, and that an partly-initialized
   structure is filled with zeroes */

  return timezone(timeptr, &{0} ); }

struct tm * localtime( const time_t * timer )
{  return zonetime(timer, NULL, NULL); }

struct tm * gmtime( const time_t * timer )
{  return zonetime(timer, &{0}, NULL); }

struct tm * localtime_r( const time_t * timer, struct tm* p )
{  return zonetime(timer, NULL, p); }

struct tm * gmtime_r( const time_t * timer, struct tm* p )
{  return zonetime(timer, &{0}, p); }

struct tm * CD1_zonetime( const time_t * timer, int value )
{  return zonetime(timer, &{value}, p); }


Another usefull call is when you parse a date with an
explicit timezone indication, like in an e-mail client,
and want to deal with: just transform "+hhmm" into a
count of seconds sec, and call
  t = timezone(&tm, &{sec});


Of course, zonetime is allowed to return a NULL pointer
if the given inputs are not valid (and so must be
allowed localtime, as pointed out in PC-US0011#13), or
if the offset between timer and UTC cannot be determined
(as it is currently the case with gmtime); this is
the same for timezone.
(BTW: all names are just indications, they are open
to discussion; timezone is probably a bad choice, since
it was used in some versions of UNIX).


Then, the type of tz_gmtoff need not be `long'; it needs
only to be a numeric type capable of holding all the
integers in the range [-89999, 89999] -- which would be
enough to satisfy POSIX.1, plus another value meaning
_LOCALTIME like in CD1.  It might be worth adding a
type gmtoff_t (or utcoff_t) for this.
This is important if we think about the ways to retrieve
the correct offset of the timezone to UTC *after* the
call of the function, in particular in the cases of time
zones with DST changes, like is local time.

So an alternative model might be:
struct tm *  zonetime (
  const time_t * timer,
  const struct tzinfo * tz,        /* if NULL, use local time */
  struct tm *    timeptr;          /* if NULL, use internal */
  gmtoff_t *     offsetptr);       /* if NULL, do not store */

time_t timezone (
  struct tm *    timeptr,     
  const struct tzinfo * tz;        /* if NULL, use local time */
  gmtoff_t *     offsetptr);       /* if NULL, do not store */

Another possibility is to add another function, that returns
the offset of the time zone from a struct tm; like

  gmtoff_t  gmtoff ( const struct tm *timeptr );

or perhaps

  gmtoff_t  gmtoff (
         const struct tm *timeptr,
         const struct tzinfo *tz );

(The Comittee might consider turning gmt to utc; but that is
another story).


Then, to extend this to handle "real" time zones, instead
of just their offsets, we need to go a little further.
C standard choose to *not* describe in details the behavior,
and I feel it can be considered too heavy for some
implementations (need to update, for example). Also,
specifying point #4 might be very tricky in the context
of a International Standard (like Israel or Saudian rules).

So I believe point #4 should be left as a QoI issue
(but should be available as natural extension, of course).

OTOH, in the realm of POSIX, there is a quite natural
extension to this mechanism which fits the need:

struct tzinfo *tzalloc (
  const char *tzspec);             /* if NULL, use local time */

where tzspec has the same format as the contents of the
POSIX.1 TZ environment variable (with the usual extensions,
such as Olson's : prefix, allowed).

This function yields a null pointer if given an invalid
specification; the storage allocated by this function can
be freed with `free'. 
 
Having a mechanism to dynamicaly allocate storage is required,
because the most powerful implementations will store in
struct tzinfo historical information, which grows with time,
and thus requires this struct to be implemented as a VLA.


This design fits pretty well in Olson's code, where it
only adds small things.  It is also pretty light (when
compared to the actual <time.h> stuff), and eases
upgrading current implementations, more limited in
functionnalities (like "only USA rules are used" found
in numerous std libc coming from the USA ;-), or even
"I only know localtime" flavors).
Please vendors that may have a different point of view
do write it to me.

It also have the property of *not* breaking current
binary compatibility, as it does not extend the meaning
of anything in struct tm (but merely stores the necessary
informations in a different place).
But please keep reading.

However, it has a major flaw: it does not permit an
explicit handling of leap seconds.
And I do not how to solve this: I do not want to
add new parameters (too expensive for almost no gain
in day-to-day use), and I do not want to introduce
a new kind of structure (too complex I believe).
(Another problem is that I do not know how to express
the rules about leap seconds in the context of the
library, while allowing an implementation to
optionaly support it... see the point of Paul Eggert
in his PC about mkxtime on +1 day on June 30th, 1997,
midnight, which in the current draft doesn't yield
July 1st).

For the leapseconds, there is first a very delicate inter-
operability problem, since POSIX.1 request time_t value to
*not* record any leap second information at all.
So I have no practical solution here, outside the one
proposed in CD1, i.e. to extend struct tm to have a new
field storing this information.

This (and another open problem, how to print the actual
time zone name) leads to another question:  why does the
Comittee introduced a new type for the time functions,
instead of extending struct tm?

In the ausence of the answer to this question, I stay with
an open alternative on how to end this proposal:
  a) extending struct tm explicitely (adding fields), which
     perhaps might require using tm_isdst field as a flag
     to request/signal C9X behavior (tm_isdst is superseeded
     by the informations in the struct tzinfo)
  b) requesting indirectly implementations to insert the new
     fields to support the whole spec, but keeping them invisible
     to the "conforming" user (as do tz package or BSD right now)
  c) not extending struct tm and adding some more arguments to the
     new functions (or additional functions) to collect other
     informations
and of course
  d) CD1's solution, creating a structure for extending (still
     a correct possibility), replacing the tm_ext stuff with
     a pointer to a tzinfo struct


Waiting for your welcome and certainly useful comments,

Antoine



Date: Mon, 15 Jun 1998 10:29:58 +0100 (BST)
From: "Joseph S. Myers" <jsm28@cam.ac.uk>
To: Antoine Leca <Antoine.Leca@renault.fr>
Cc: tz@elsie.nci.nih.gov
Subject: Re: comments on draft ISO C9x changes to <time.h>

On Mon, 15 Jun 1998, Antoine Leca wrote:

> OTOH, in the realm of POSIX, there is a quite natural
> extension to this mechanism which fits the need:
> 
> struct tzinfo *tzalloc (
>   const char *tzspec);             /* if NULL, use local time */
> 
> where tzspec has the same format as the contents of the
> POSIX.1 TZ environment variable (with the usual extensions,
> such as Olson's : prefix, allowed).
> 
> This function yields a null pointer if given an invalid
> specification; the storage allocated by this function can
> be freed with `free'. 

It might make sense to use the same sort of interface as for the POSIX.2
regular expression functions, e.g.

int tzcomp(timezone_t *zone, const char *tzspec);
size_t tzerror(int errcode, const timezone_t *zone, char *errbuf,
               size_t *errbuf_size);
void tzfree(timezone_t *zone);

tzcomp would return zero on success, or an error code otherwise
(e.g. TZ_BADSPEC for an invalid tzspec string, TZ_NOMEM for allocation
failure or implementation defined values for other errors, e.g. in a
timezone file); tzspec would be a POSIX.1 TZ value or NULL for an
implementation defined local timezone.  This allows extension through the
use of the implementation defined values beginning with ':'.  tzerror
would convert an error code to a string (modeled on regerror); tzfree
would free any allocated parts of the timezone_t structure.  The structure
could have some specified elements (e.g. UTC offset) if these are useful
and can be sensibly defined (i.e., a timezone_t can represent a complete
timezone history covering the past and future - what date's offset should
be given?).

> For the leapseconds, there is first a very delicate inter-
> operability problem, since POSIX.1 request time_t value to
> *not* record any leap second information at all.
> So I have no practical solution here, outside the one
> proposed in CD1, i.e. to extend struct tm to have a new
> field storing this information.

The most compatible solution would be to use Markus Kuhn's CLOCK_UTC with
nanosecond values up to 1999999999 during leap seconds (and if the system
clock ticks TAI the library handles the conversion using a leap second
table).  This would however require new interfaces for time conversion
that take times with nanoseconds.

What is the `correct' time display to give for a leapsecond in a zone with
an offset from UTC that is not an integral number of minutes?  Have there
actually been any such zones since the start of the leapsecond system?

-- 
Joseph S. Myers
jsm28@cam.ac.uk