bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* [PATCH] ISO 8601 basic format and decimal hours, minutes
@ 2019-02-19  4:00 Alex Eulenberg
  2019-04-07  3:26 ` Alex Eulenberg
  2019-04-07  7:20 ` Assaf Gordon
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Eulenberg @ 2019-02-19  4:00 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Alex Eulenberg

1. Accept dates in the ISO 8601 basic date and time format. This
is a format allowing the expression of point in time to the resolution
of a second with only alphanumeric characters -- no spaces, no
punctuation -- suitable for use in fields that cannot contain spaces,
hyphens, or colons.

2. Accept decimal fractions of time units other than seconds in
accordance with ISO 8601. Currently only decimal fractions of a
second are accepted. With the patch, the decimal separator may be
used for the expression of fractions of a minute (in both basic and
extended formats) as well as fractions of an hour.

The ISO 8601 basic date and time format consists of digits representing
the date, followed by the letter "T", followed by a string of digits
(optionally including a dot or comma as decimal separator) representing
the time of day, optionally followed by a zone offset or "Z" for
zero offset (UTC).

The patch will allow parse-datetime to accept strings of the following forms:

YYYYMMDDThhmmss[Z]: 20190102T030405, 20180102T030405Z, 20180102T110405-0800
YYYYMMDDThhmm[Z]: 20190121T0304, 20180102T0304Z, 20180102T1104-0800
YYYYMMDDThh[Z]:  20190121T03, 20180102T03Z, 20180102T11-0800

Note a variant of this format, where the separator is " " instead of "T",
is already accepted by parse-datetime but only when the time component
is either "hh" or "hhmm" (no seconds). With the patch, these forms are still
accepted, and a time component of form "hhmmss" is accepted as well.

YYYYMMDD hhmmss[Z]: 20190102 030405, 20190102 030405Z, 20190102 030405 Z,

The optional decimal part is interpreted as a fraction of a second,
a minute, or an hour, corresponding to what the previous two digits
represent:

20180102T210729.068302473             = 2018-01-02T21:07:29,068302473
19800615T1230.25                      = 1980-06-15T12:30:15
1980-06-15T12:30,25                   =          "
19800615T12.25                        = 1980-06-15T12:15:00
1980-06-15T12:00.875                  = 1980-06-15T12:00:52,5
19660620T06,666666666                 = 1966-06-20T06:40:00

Dates with two-digit year specifiers (YYMMDD) are also accepted following
existing parse-datetime rules: when YY is 69-99, the year is 1969-1999;
when YY is 00-68, the year is 2000-2068), so:

180102T0304   = 2018-01-02T03:04
991231T2300   = 1999-12-31T23:00
190120T030405 = 2019-01-20T03:04:05

Note use of two-digit year specifiers in both basic and extended formats
is allowed in the second edition of ISO 8601 (published in 2000), but it
has been removed as of the third (2004) edition.

Forms no longer accepted as a result of the patch:

1. The current version of parse-datetime already accepts input in ISO
8061 basic format with the " " separator to a limited extent.  It works
with time strings of the form "hh" and "hhmm" but not "hhmmss".
Furthermore, when zone offset is added (e.g. -05:00 or +03, but not zone
names such as Z, UTC, or EST), the results are sometimes accepted, but
are often incorrect. In particular, the time digits are always
interpreted as number of hours, so four digits would be intepreted as
"hhhh" not "hhmm".

With the patch, basic format with date-time separator " " can be used
with times of form "hhmmss" (as well as "hh", "hhmm", and "hhmmss.n+") but
the use of zone offset here is uniformly rejected as invalid:

input                  current output              output with patch
"20180102 00-03"       2018-01-02T03:00+00 +       (rejected)
"20180102 04-03"       2018-01-02T07:00+00 +       (rejected)
"20180102 0004-03"     2018-01-02T07:00+00 *       (rejected)
"20180102 0004+00"     2018-01-02T04:00+00 *       (rejected)
"20180102 0004Z"       2018-01-02T00:04+00 ++      (rejected)
"20180102 0040-05"     (rejected) **               (rejected)
"20180102 0040 EST"    2018-01-02T05:40+00 +++     (rejected)

+ correct with two-digit time (hh) and zone offset.
* incorrect with zone offset: 0004 interpreted as 4 h (leading zeroes ignored)
++ correct with zone name: 0004 interpreted as 00:04
** rejected with zone offset: 0040 interpreted as 40 h (invalid hour of day)
+++ correct with zone name: 0040 interpreted as 00:40

With the patch all corresponding expressions with the "T"
separator are accepted and correctly interpreted :

input                  current output              output with patch
"20180102T00-03"       (rejected)                  2018-01-02T03:00+00
"20180102T04-03"       (rejected)                  2018-01-02T07:00+00
"20180102T0004-03"     (rejected)                  2018-01-02T03:04+00
"20180102T0004+00"     (rejected)                  2018-01-02T00:04+00
"20180102T0004Z"       (rejected)                  2018-01-02T00:04+00
"20180102T0040-05"     (rejected)                  2018-01-02T05:40+00
"20180102T0040 EST"    (rejected)                  2018-01-02T05:40+00

2. For inputs of form "(YY)YYMMDD'T'hhmm" the current version of
parse-datetime interprets the "T" as the single-letter military time zone
identifier equivalent to zone offset +0700. This is probably never the
intention.

input (TZ=UTC)         current output              output with patch
"19691231T0700"        1969-12-31T00:00+00 *       1969-12-31T07:00+00
"19691231T0700Z"       (rejected) **               1969-12-31T07:00+00
* "T" interpreted as +0700
** rejected because two zone identifiers were recognized ("T" and "Z")

3. With the patch, there is at least one case where "T"
is surrounded by digits and the input is rejected (because it is not
a valid ISO 8601 basic format date), while with another zone
name in its place the input is accepted; whereas both forms are
accepted equally by the current parse-datetime. Again, such cases are
probably unlikely to occur as intentional input.

input                  current output              output with patch
"0700 T 31 DEC 1969"   1969-12-31T00:00+00         (rejected)
"0800 U 31 DEC 1969"   1969-12-31T00:00+00         1969-12-31T00:00+00
"07:00 T 31 DEC 1969"  1969-12-31T00:00+00         1969-12-31T00:00+00
---
 doc/parse-datetime.texi     |  17 ++--
 lib/parse-datetime.y        | 193 ++++++++++++++++++++++++++++++------
 tests/test-parse-datetime.c | 115 +++++++++++++++++++++
 3 files changed, 291 insertions(+), 34 deletions(-)

diff --git a/doc/parse-datetime.texi b/doc/parse-datetime.texi
index 193575edc..1dc98e21a 100644
--- a/doc/parse-datetime.texi
+++ b/doc/parse-datetime.texi
@@ -188,7 +188,8 @@ Here are the rules.
 @cindex ISO 8601 date format
 @cindex date format, ISO 8601
 For numeric months, the ISO 8601 format
-@samp{@var{year}-@var{month}-@var{day}} is allowed, where @var{year} is
+@samp{@var{year}-@var{month}-@var{day}} is allowed, with ("extended"
+format) or without ("basic" format) dashes, where @var{year} is
 any positive number, @var{month} is a number between 01 and 12, and
 @var{day} is a number between 01 and 31.  A leading zero must be present
 if a number is less than ten.  If @var{year} is 68 or smaller, then 2000
@@ -322,14 +323,14 @@ timestamps are interpreted using the rules of the default time zone
 @cindex ISO 8601 date and time of day format
 @cindex date and time of day format, ISO 8601
 
-The ISO 8601 date and time of day extended format consists of an ISO
-8601 date, a @samp{T} character separator, and an ISO 8601 time of
-day.  This format is also recognized if the @samp{T} is replaced by a
-space.
+The ISO 8601 date and time of day format consists of an ISO 8601 date
+(with or without hyphens), a @samp{T} character separator, and
+an ISO 8601 time of day (with or without colons).  This format is
+also recognized if the @samp{T} is replaced by a space.
 
 In this format, the time of day should use 24-hour notation.
 Fractional seconds are allowed, with either comma or period preceding
-the fraction.  ISO 8601 fractional minutes and hours are not
+the fraction.  ISO 8601 fractional minutes and hours are also
 supported.  Typically, hosts support nanosecond timestamp resolution;
 excess precision is silently discarded.
 
@@ -339,6 +340,10 @@ Here are some examples:
 2012-09-24T20:02:00.052-05:00
 2012-12-31T23:59:59,999999999+11:00
 1970-01-01 00:00Z
+19700101T0000Z
+20380118 03:14:07,5
+2038-01-18T03.125
+20380118T03.235416667
 @end example
 
 @node Day of week items
diff --git a/lib/parse-datetime.y b/lib/parse-datetime.y
index b264bb7fb..234c8b0d6 100644
--- a/lib/parse-datetime.y
+++ b/lib/parse-datetime.y
@@ -155,6 +155,14 @@ typedef struct
   ptrdiff_t digits;
 } textint;
 
+/* A decimal value, and the number of digits in the integer part of
+   its textual representation.  */
+typedef struct
+{
+  struct timespec timespec;
+  ptrdiff_t digits;
+} hhmmss_decimal;
+
 /* An entry in the lexical lookup table.  */
 typedef struct
 {
@@ -256,6 +264,62 @@ static int yylex (union YYSTYPE *, parser_control *);
 static int yyerror (parser_control const *, char const *);
 static bool time_zone_hhmm (parser_control *, textint, intmax_t);
 
+static void
+digits_to_time (parser_control *pc, textint text_int)
+{
+  intmax_t balance = text_int.value;
+
+  pc->hour = pc->minutes = pc->seconds.tv_sec =  pc->seconds.tv_nsec = 0;
+
+  if ( text_int.digits >= 6 )
+    {
+      pc->seconds.tv_sec = balance % 100;
+      pc->seconds.tv_nsec = 0;
+      balance = balance / 100;
+    }
+  if ( text_int.digits >= 4 )
+    {
+      pc->minutes = balance % 100;
+      balance = balance / 100;
+    }
+
+  pc->hour = balance;
+  pc->meridian = MER24;
+}
+
+static void
+digits_to_date (parser_control *pc, textint text_int)
+{
+  if (text_int.digits > 4)
+    {
+      pc->day = text_int.value % 100;
+      pc->month = (text_int.value / 100) % 100;
+      pc->year.value = text_int.value / 10000;
+      pc->year.digits = text_int.digits - 4;
+    }
+  else
+    {
+      /* ISO 8601 says:
+           4 digit date string = year and
+           2 digit date string = century (multiply by 100 years).
+         Here we accommodate this as well as accept 3 digits
+         as year and 1 digit as century.
+
+         Note this section is provided for theoretical completeness and
+         future use, as every current caller of this function ensures
+         text_int.digits > 4. */
+      pc->day = 1;
+      pc->month = 1;
+      pc->year = text_int;
+      if (text_int.digits <= 2)
+        {
+          pc->year.value = text_int.value * 100;
+          pc->year.digits = text_int.digits + 2;
+        }
+    }
+}
+
+
 /* Extract into *PC any date and time info from a string of digits
    of the form e.g., YYYYMMDD, YYMMDD, HHMM, HH (and sometimes YYY,
    YYYY, ...).  */
@@ -270,30 +334,15 @@ digits_to_date_time (parser_control *pc, textint text_int)
     }
   else
     {
-      if (4 < text_int.digits)
+      if (!pc->dates_seen && 4 < text_int.digits)
         {
           pc->dates_seen++;
-          pc->day = text_int.value % 100;
-          pc->month = (text_int.value / 100) % 100;
-          pc->year.value = text_int.value / 10000;
-          pc->year.digits = text_int.digits - 4;
+          digits_to_date (pc, text_int);
         }
       else
         {
           pc->times_seen++;
-          if (text_int.digits <= 2)
-            {
-              pc->hour = text_int.value;
-              pc->minutes = 0;
-            }
-          else
-            {
-              pc->hour = text_int.value / 100;
-              pc->minutes = text_int.value % 100;
-            }
-          pc->seconds.tv_sec = 0;
-          pc->seconds.tv_nsec = 0;
-          pc->meridian = MER24;
+          digits_to_time (pc, text_int);
         }
     }
 }
@@ -323,6 +372,41 @@ apply_relative_time (parser_control *pc, relative_time rel, int factor)
   return true;
 }
 
+static void
+decimal_to_time (parser_control *pc, hhmmss_decimal hd)
+{
+    textint int_part;
+    int_part.digits = hd.digits;
+    int_part.value = hd.timespec.tv_sec;
+    int_part.negative = false;
+    digits_to_time (pc, int_part);
+    double decimal_part = (double) hd.timespec.tv_nsec / (double) BILLION;
+    int seconds_multiplier;
+    relative_time decimal_part_rel = RELATIVE_TIME_0;
+
+    if (int_part.digits > 5)
+      {
+        pc->seconds.tv_nsec = hd.timespec.tv_nsec;
+      }
+    else
+      {
+        char rounded_str[sizeof("9999.999999999")];
+
+        if (int_part.digits > 3)
+            seconds_multiplier = 60;
+        else
+            seconds_multiplier = 3600;
+
+        const int precision = 9;
+        double seconds_with_fraction = seconds_multiplier * decimal_part;
+        sprintf(rounded_str, "%.*g", precision, seconds_with_fraction);
+        seconds_with_fraction = strtod(rounded_str, NULL);
+        decimal_part_rel.seconds = (intmax_t) (seconds_with_fraction);
+        decimal_part_rel.ns = (int) ((seconds_with_fraction - decimal_part_rel.seconds) * BILLION);
+        apply_relative_time (pc, decimal_part_rel, 1);
+      }
+  }
+
 /* Set PC-> hour, minutes, seconds and nanoseconds members from arguments.  */
 static void
 set_hhmmss (parser_control *pc, intmax_t hour, intmax_t minutes,
@@ -577,6 +661,7 @@ debug_print_relative_time (char const *item, parser_control const *pc)
   intmax_t intval;
   textint textintval;
   struct timespec timespec;
+  hhmmss_decimal hhmmss_decimal;
   relative_time rel;
 }
 
@@ -590,7 +675,7 @@ debug_print_relative_time (char const *item, parser_control const *pc)
 %token <intval> tMONTH tORDINAL tZONE
 
 %token <textintval> tSNUMBER tUNUMBER
-%token <timespec> tSDECIMAL_NUMBER tUDECIMAL_NUMBER
+%token <hhmmss_decimal> tSDECIMAL_NUMBER tUDECIMAL_NUMBER
 
 %type <intval> o_colon_minutes
 %type <timespec> seconds signed_seconds unsigned_seconds
@@ -657,6 +742,13 @@ item:
       {
         debug_print_current_time (_("number"), pc);
       }
+  | number_T
+      {
+        pc->time_zone = HOUR (7);
+        pc->zones_seen++;
+        if (! pc->times_seen) pc->dates_seen++;
+        debug_print_current_time (_("number_T"), pc);
+      }
   | hybrid
       {
         debug_print_relative_time (_("hybrid"), pc);
@@ -668,7 +760,10 @@ datetime:
   ;
 
 iso_8601_datetime:
-    iso_8601_date 'T' iso_8601_time
+    iso_8601_date_T iso_8601_time
+  | iso_8601_date_T time_number o_zone_offset
+  | number_T time_number o_zone_offset
+  | number_T iso_8601_time
   ;
 
 time:
@@ -691,14 +786,18 @@ time:
   ;
 
 iso_8601_time:
-    tUNUMBER zone_offset
+    tUNUMBER ':' tUNUMBER o_zone_offset
       {
-        set_hhmmss (pc, $1.value, 0, 0, 0);
+        set_hhmmss (pc, $1.value, $3.value, 0, 0);
         pc->meridian = MER24;
       }
-  | tUNUMBER ':' tUNUMBER o_zone_offset
+  | tUNUMBER ':' tUDECIMAL_NUMBER o_zone_offset
       {
-        set_hhmmss (pc, $1.value, $3.value, 0, 0);
+        hhmmss_decimal hhmm;
+        hhmm.timespec.tv_sec = $1.value * 100 + $3.timespec.tv_sec;
+        hhmm.timespec.tv_nsec = $3.timespec.tv_nsec;
+        hhmm.digits = $1.digits + $3.digits;
+        decimal_to_time (pc, hhmm);
         pc->meridian = MER24;
       }
   | tUNUMBER ':' tUNUMBER ':' unsigned_seconds o_zone_offset
@@ -889,6 +988,28 @@ iso_8601_date:
       }
   ;
 
+iso_8601_date_T:
+    iso_8601_date 'T'
+
+number_T:
+    tUNUMBER 'T'
+      {
+        if ($1.digits <= 4 || (pc->dates_seen))
+        /* Number is a time.  Here 'T' must be a military time zone  */
+          {
+            digits_to_time (pc, $1);
+            pc->times_seen++;
+          }
+        /* Number is a date.  Here 'T' could be either military time zone
+           or a date-time separator  */
+        else
+          {
+            digits_to_date (pc, $1);
+          }
+      }
+  ;
+
+
 rel:
     relunit tAGO
       { if (! apply_relative_time (pc, $1, $2)) YYABORT; }
@@ -936,9 +1057,9 @@ relunit:
   | tUNUMBER tSEC_UNIT
       { $$ = RELATIVE_TIME_0; $$.seconds = $1.value; }
   | tSDECIMAL_NUMBER tSEC_UNIT
-      { $$ = RELATIVE_TIME_0; $$.seconds = $1.tv_sec; $$.ns = $1.tv_nsec; }
+      { $$ = RELATIVE_TIME_0; $$.seconds = $1.timespec.tv_sec; $$.ns = $1.timespec.tv_nsec; }
   | tUDECIMAL_NUMBER tSEC_UNIT
-      { $$ = RELATIVE_TIME_0; $$.seconds = $1.tv_sec; $$.ns = $1.tv_nsec; }
+      { $$ = RELATIVE_TIME_0; $$.seconds = $1.timespec.tv_sec; $$.ns = $1.timespec.tv_nsec; }
   | tSEC_UNIT
       { $$ = RELATIVE_TIME_0; $$.seconds = 1; }
   | relunit_snumber
@@ -969,6 +1090,7 @@ seconds: signed_seconds | unsigned_seconds;
 
 signed_seconds:
     tSDECIMAL_NUMBER
+      { $$ = $1.timespec; }
   | tSNUMBER
       { if (time_overflow ($1.value)) YYABORT;
         $$.tv_sec = $1.value; $$.tv_nsec = 0; }
@@ -976,14 +1098,28 @@ signed_seconds:
 
 unsigned_seconds:
     tUDECIMAL_NUMBER
+      { $$ = $1.timespec; }
   | tUNUMBER
       { if (time_overflow ($1.value)) YYABORT;
         $$.tv_sec = $1.value; $$.tv_nsec = 0; }
   ;
 
+time_number:
+    tUNUMBER
+      { digits_to_time (pc, $1); }
+  | tUDECIMAL_NUMBER
+      { decimal_to_time (pc, $1); }
+  ;
+
 number:
     tUNUMBER
       { digits_to_date_time (pc, $1); }
+  | tUDECIMAL_NUMBER
+      {
+        if (!pc->dates_seen) YYABORT;
+        pc->times_seen++;
+        decimal_to_time (pc, $1);
+      }
   ;
 
 hybrid:
@@ -1447,6 +1583,7 @@ yylex (union YYSTYPE *lvalp, parser_control *pc)
 
           if ((c == '.' || c == ',') && c_isdigit (p[1]))
             {
+              lvalp->hhmmss_decimal.digits = p - pc->input;
               time_t s;
               int ns;
               int digits;
@@ -1487,8 +1624,8 @@ yylex (union YYSTYPE *lvalp, parser_control *pc)
                   ns = BILLION - ns;
                 }
 
-              lvalp->timespec.tv_sec = s;
-              lvalp->timespec.tv_nsec = ns;
+              lvalp->hhmmss_decimal.timespec.tv_sec = s;
+              lvalp->hhmmss_decimal.timespec.tv_nsec = ns;
               pc->input = p;
               return sign ? tSDECIMAL_NUMBER : tUDECIMAL_NUMBER;
             }
diff --git a/tests/test-parse-datetime.c b/tests/test-parse-datetime.c
index f80f71baa..4978d2742 100644
--- a/tests/test-parse-datetime.c
+++ b/tests/test-parse-datetime.c
@@ -129,6 +129,26 @@ main (int argc _GL_UNUSED, char **argv)
   gmtoff = gmt_offset (ref_time);
 
 
+  /* ISO 8601 basic date and time of day representation,
+     'T' separator, local time zone */
+  p = "20110501T115518";
+  expected.tv_sec = ref_time - gmtoff;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* ISO 8601 basic date and time of day representation,
+     ' ' separator, local time zone */
+  p = "20110501 115518";
+  expected.tv_sec = ref_time - gmtoff;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
   /* ISO 8601 extended date and time of day representation,
      'T' separator, local time zone */
   p = "2011-05-01T11:55:18";
@@ -150,6 +170,26 @@ main (int argc _GL_UNUSED, char **argv)
           && expected.tv_nsec == result.tv_nsec);
 
 
+  /* ISO 8601, basic date and time of day representation,
+     'T' separator, UTC */
+  p = "20110501T115518Z";
+  expected.tv_sec = ref_time;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* ISO 8601, basic date and time of day representation,
+     ' ' separator, UTC */
+  p = "20110501 115518Z";
+  expected.tv_sec = ref_time;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
   /* ISO 8601, extended date and time of day representation,
      'T' separator, UTC */
   p = "2011-05-01T11:55:18Z";
@@ -171,6 +211,19 @@ main (int argc _GL_UNUSED, char **argv)
           && expected.tv_nsec == result.tv_nsec);
 
 
+  /* ISO 8601 extended date and time of day representation,
+     'T' separator, w/UTC offset */
+  p = "20110501T115518-0700";
+  expected.tv_sec = 1304276118;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* ISO 8601 extended date and time of day representation,
+     ' ' separator, w/UTC offset NOT SUPPORTED */
+
   /* ISO 8601 extended date and time of day representation,
      'T' separator, w/UTC offset */
   p = "2011-05-01T11:55:18-07:00";
@@ -192,6 +245,19 @@ main (int argc _GL_UNUSED, char **argv)
           && expected.tv_nsec == result.tv_nsec);
 
 
+  /* ISO 8601 basic date and time of day representation,
+     'T' separator, w/hour only UTC offset */
+  p = "20110501T115518-07";
+  expected.tv_sec = 1304276118;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* ISO 8601 basic date and time of day representation,
+     ' ' separator, w/hour only UTC offset NOT SUPPORTED */
+
   /* ISO 8601 extended date and time of day representation,
      'T' separator, w/hour only UTC offset */
   p = "2011-05-01T11:55:18-07";
@@ -213,6 +279,55 @@ main (int argc _GL_UNUSED, char **argv)
           && expected.tv_nsec == result.tv_nsec);
 
 
+  /* decimal seconds */
+  p = "2038-01-19 03:14:07,44 UTC";
+  expected.tv_sec = 2147483647;
+  expected.tv_nsec = 440000000;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* decimal minutes */
+  p = "2038-01-19 03:14.124 UTC";
+  expected.tv_sec = 2147483647;
+  expected.tv_nsec = 440000000;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* decimal hours */
+  p = "2038-01-19 03.2354 UTC";
+  expected.tv_sec = 2147483647;
+  expected.tv_nsec = 440000000;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+
+  /* Leading zeroes are significant.  */
+
+  /* first century and decimal seconds */
+  p = "00010101T001010.1Z"; /* = 0001-01-01T00:10:10,1+00:00 */
+  expected.tv_sec = -62135596190;
+  expected.tv_nsec = 100000000;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+  /* twenty-first century and decimal minutes */
+  p = "010101T1010.1Z"; /* = 2001-01-01T10:10:06+00:00  */
+  expected.tv_sec = 978343806;
+  expected.tv_nsec = 0;
+  ASSERT (parse_datetime (&result, p, 0));
+  LOG (p, expected, result);
+  ASSERT (expected.tv_sec == result.tv_sec
+          && expected.tv_nsec == result.tv_nsec);
+
+
   now.tv_sec = 4711;
   now.tv_nsec = 1267;
   p = "now";
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ISO 8601 basic format and decimal hours, minutes
  2019-02-19  4:00 [PATCH] ISO 8601 basic format and decimal hours, minutes Alex Eulenberg
@ 2019-04-07  3:26 ` Alex Eulenberg
  2019-04-07  7:20 ` Assaf Gordon
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Eulenberg @ 2019-04-07  3:26 UTC (permalink / raw)
  To: bug-gnulib

On 2019-02-18 20:00 PST, Alex Eulenberg wrote:
> 1. Accept dates in the ISO 8601 basic date and time format. This
> is a format allowing the expression of point in time to the resolution
> of a second with only alphanumeric characters -- no spaces, no
> punctuation -- suitable for use in fields that cannot contain spaces,
> hyphens, or colons.
> 
> 2. Accept decimal fractions of time units other than seconds in
> accordance with ISO 8601. Currently only decimal fractions of a
> second are accepted. With the patch, the decimal separator may be
> used for the expression of fractions of a minute (in both basic and
> extended formats) as well as fractions of an hour.

Can one of the GNULib maintainers please comment on this submission?

--Alex


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ISO 8601 basic format and decimal hours, minutes
  2019-02-19  4:00 [PATCH] ISO 8601 basic format and decimal hours, minutes Alex Eulenberg
  2019-04-07  3:26 ` Alex Eulenberg
@ 2019-04-07  7:20 ` Assaf Gordon
  2019-04-08  5:05   ` Paul Eggert
  2020-03-23  0:07   ` Alex Eulenberg
  1 sibling, 2 replies; 5+ messages in thread
From: Assaf Gordon @ 2019-04-07  7:20 UTC (permalink / raw)
  To: Alex Eulenberg, bug-gnulib

Hello Alex,

Thank for you putting the time for writing the patch
and descriptions in such details.

In general:

On 2019-02-18 9:00 p.m., Alex Eulenberg wrote:
> 1. Accept dates in the ISO 8601 basic date and time format.
[...]> 2. Accept decimal fractions of time units other than seconds in
> accordance with ISO 8601.
[...]
> Forms no longer accepted as a result of the patch:
[...]
> input                  current output              output with patch
> "20180102 00-03"       2018-01-02T03:00+00 +       (rejected)
> "20180102 04-03"       2018-01-02T07:00+00 +       (rejected)
[...]
> 
> input                  current output              output with patch
> "20180102T00-03"       (rejected)                  2018-01-02T03:00+00
> "20180102T04-03"       (rejected)                  2018-01-02T07:00+00

Adding new accepted formats is good,
but rejecting (or changing the meaning of) currently-accepted formats
is a much more problematic decision due to breaking existing programs.

I think that before continuing, it should be discussed and decided
whether this is acceptable or not (Paul, Jim?).

If we do go forward this this breaking-change, we should make sure
to announce it and communicate it properly to users of coreutils' date(1).

As a compromise, note that the 'parse_datetime2' function accepts
'flags' parameter (added in recent years). Perhaps a flag could
be specified if 'T' is always iso8601 or military time
(and then add an option to date(1) ).


> 2. For inputs of form "(YY)YYMMDD'T'hhmm" the current version of
> parse-datetime interprets the "T" as the single-letter military time zone
> identifier equivalent to zone offset +0700. This is probably never the
> intention.

Not so, 'T' has special handling code, and in some of the cases
is explicitly translated to HOUR(7):
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/parse-datetime.y#n751


>   doc/parse-datetime.texi     |  17 ++--
>   lib/parse-datetime.y        | 193 ++++++++++++++++++++++++++++++------
>   tests/test-parse-datetime.c | 115 +++++++++++++++++++++
>   3 files changed, 291 insertions(+), 34 deletions(-)

As your contribution is significant (more than 10 lines),
a copyright assignment is required before we can review the patch (see 
https://www.gnu.org/licenses/why-assign.html ).

Please complete the following short form and send it to assign@gnu.org:

https://git.savannah.gnu.org/cgit/gnulib.git/tree/doc/Copyright/request-assign.future

regards,
  - assaf








^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ISO 8601 basic format and decimal hours, minutes
  2019-04-07  7:20 ` Assaf Gordon
@ 2019-04-08  5:05   ` Paul Eggert
  2020-03-23  0:07   ` Alex Eulenberg
  1 sibling, 0 replies; 5+ messages in thread
From: Paul Eggert @ 2019-04-08  5:05 UTC (permalink / raw)
  To: Assaf Gordon, Alex Eulenberg, bug-gnulib

Assaf Gordon wrote:
> Adding new accepted formats is good,
> but rejecting (or changing the meaning of) currently-accepted formats
> is a much more problematic decision due to breaking existing programs.
> 
> I think that before continuing, it should be discussed and decided
> whether this is acceptable or not (Paul, Jim?).

I'd rather not reject existing valid input unless there's a good reason. Is 
there one? I didn't see one in the original proposal (admittedly I didn't read 
it word for word).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ISO 8601 basic format and decimal hours, minutes
  2019-04-07  7:20 ` Assaf Gordon
  2019-04-08  5:05   ` Paul Eggert
@ 2020-03-23  0:07   ` Alex Eulenberg
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Eulenberg @ 2020-03-23  0:07 UTC (permalink / raw)
  To: bug-gnulib

Posting my response now that I have I finally got the copyright 
assignment, including employer disclaimer, finalized.

Original patch posted Feb 19, 2019, for reference:

https://lists.gnu.org/archive/html/bug-gnulib/2019-02/msg00041.html

Now responding to Assaf's comments from April of last year.

On 2019-04-07 00:20 PDT, Assaf Gordon wrote:

> Adding new accepted formats is good,
> but rejecting (or changing the meaning of) currently-accepted formats
> is a much more problematic decision due to breaking existing programs.
> 
> I think that before continuing, it should be discussed and decided
> whether this is acceptable or not (Paul, Jim?).

Regarding the "breaking" change. The format that breaks is not included 
in the current test suite, and I think is unlikely to occur in actual 
existing programs:

YYYYMMDD HH-ZZ

Note that the following other related formats are currently rejected or 
give incorrect results anyway:

YYYYMMDD HHMM-ZZ (incorrect result)
YYYYMMDD HHMMSS-ZZ (incorrect result)
YYYYMMDDTHHMM-ZZ (rejected)
YYYYMMDDTHHMMSS-ZZ (rejected)

What my patch does is to allow the currently-rejected ISO-compliant 
strings ("T" separator) to be accepted and interpreted correctly, while 
removing altogether the acceptance of a marginal class of strings (" " 
separator, no dashes or colons, in conjunction with zone offset), which 
is currently handled unreliably anyway.

> If we do go forward this this breaking-change, we should make sure
> to announce it and communicate it properly to users of coreutils' 
> date(1).

Agreed.


> As a compromise, note that the 'parse_datetime2' function accepts
> 'flags' parameter (added in recent years). Perhaps a flag could
> be specified if 'T' is always iso8601 or military time
> (and then add an option to date(1) ).

I would rather not have to set a flag for the sake of this corner case. 
I suppose it would be impossible to confirm that no one is actually 
using this format, but perhaps there is some way we could be satisfied 
that the existing poor support for this marginal format can be dropped.

--Alex


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-23  0:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19  4:00 [PATCH] ISO 8601 basic format and decimal hours, minutes Alex Eulenberg
2019-04-07  3:26 ` Alex Eulenberg
2019-04-07  7:20 ` Assaf Gordon
2019-04-08  5:05   ` Paul Eggert
2020-03-23  0:07   ` Alex Eulenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).