git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] date.c: allow ISO 8601 reduced precision times
@ 2022-12-16  3:36 Phil Hord
  2022-12-16  4:23 ` Junio C Hamano
  2023-01-11  0:10 ` [PATCH v2] date.c: allow ISO 8601 reduced precision times Đoàn Trần Công Danh
  0 siblings, 2 replies; 12+ messages in thread
From: Phil Hord @ 2022-12-16  3:36 UTC (permalink / raw)
  To: git; +Cc: congdanhqx, plavarre, Phil Hord

From: Phil Hord <phil.hord@gmail.com>

ISO 8601 permits "reduced precision" time representations to omit the
seconds value or both the minutes and the seconds values.  The
abbreviate times could look like 17:45 or 1745 to omit the seconds,
or simply as 17 to omit both the minutes and the seconds.

parse_date_basic accepts the 17:45 format but it rejects the other two.
Fix it to accept 4-digit and 2-digit time values when they follow a
recognized date but no time has yet been parsed.

Add tests for these formats and some others, with and without colons.

Before this change:

$ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
2022-12-13T2300 -> 2022-12-14 03:00:55 +0000
2022-12-13T23 -> 2022-12-14 03:00:55 +0000

After this change:

$ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
2022-12-13T2300 -> 2022-12-14 07:00:00 +0000
2022-12-13T23 -> 2022-12-14 07:00:00 +0000

Note: ISO 8601 also allows reduced precision date strings such as
"2022-12" and "2022". This patch does not attempt to address these.

Reported-by: Pat LaVarre <plavarre@purestorage.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
 date.c          | 22 +++++++++++++++++++++-
 t/t0006-date.sh |  6 ++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/date.c b/date.c
index 53bd6a7932..b011b9d6b3 100644
--- a/date.c
+++ b/date.c
@@ -638,6 +638,18 @@ static inline int nodate(struct tm *tm)
 		tm->tm_sec) < 0;
 }
 
+/*
+ * Have we filled in any part of the time yet?
+ * We just do a binary 'and' to see if the sign bit
+ * is set in all the values.
+ */
+static inline int notime(struct tm *tm)
+{
+	return (tm->tm_hour &
+		tm->tm_min &
+		tm->tm_sec) < 0;
+}
+
 /*
  * We've seen a digit. Time? Year? Date?
  */
@@ -689,7 +701,11 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 
 	/* 8 digits, compact style of ISO-8601's date: YYYYmmDD */
 	/* 6 digits, compact style of ISO-8601's time: HHMMSS */
-	if (n == 8 || n == 6) {
+	/* 4 digits, compact style of ISO-8601's time: HHMM */
+	/* 2 digits, compact style of ISO-8601's time: HH */
+	if (n == 8 || n == 6 ||
+		(!nodate(tm) && notime(tm) &&
+		(n == 4 || n == 2))) {
 		unsigned int num1 = num / 10000;
 		unsigned int num2 = (num % 10000) / 100;
 		unsigned int num3 = num % 100;
@@ -698,6 +714,10 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 		else if (n == 6 && set_time(num1, num2, num3, tm) == 0 &&
 			 *end == '.' && isdigit(end[1]))
 			strtoul(end + 1, &end, 10);
+		else if (n == 4)
+			set_time(num2, num3, 0, tm);
+		else if (n == 2)
+			set_time(num3, 0, 0, tm);
 		return end - date;
 	}
 
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 2490162071..16fb0bf4bd 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -88,6 +88,12 @@ check_parse 2008-02-14 bad
 check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 +0000'
 check_parse '2008-02-14 20:30:45 -0500' '2008-02-14 20:30:45 -0500'
 check_parse '2008.02.14 20:30:45 -0500' '2008-02-14 20:30:45 -0500'
+check_parse '20080214T20:30:45' '2008-02-14 20:30:45 +0000'
+check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
+check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
+check_parse '20080214T203045' '2008-02-14 20:30:45 +0000'
+check_parse '20080214T2030' '2008-02-14 20:30:00 +0000'
+check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
 check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'
-- 
2.39.0.56.g57e2c6ebbe


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: allow ISO 8601 reduced precision times
  2022-12-16  3:36 [PATCH] date.c: allow ISO 8601 reduced precision times Phil Hord
@ 2022-12-16  4:23 ` Junio C Hamano
  2022-12-16 18:38   ` Phil Hord
  2023-01-11  0:10 ` [PATCH v2] date.c: allow ISO 8601 reduced precision times Đoàn Trần Công Danh
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-12-16  4:23 UTC (permalink / raw)
  To: Phil Hord; +Cc: git, congdanhqx, plavarre

Phil Hord <phil.hord@gmail.com> writes:

> From: Phil Hord <phil.hord@gmail.com>
>
> ISO 8601 permits "reduced precision" time representations to omit the
> seconds value or both the minutes and the seconds values.  The
> abbreviate times could look like 17:45 or 1745 to omit the seconds,
> or simply as 17 to omit both the minutes and the seconds.
>
> parse_date_basic accepts the 17:45 format but it rejects the other two.
> Fix it to accept 4-digit and 2-digit time values when they follow a
> recognized date but no time has yet been parsed.

I worry a bit that this may conflict with other approxidate
heuristics.

> $ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
> 2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
> 2022-12-13T2300 -> 2022-12-14 07:00:00 +0000
> 2022-12-13T23 -> 2022-12-14 07:00:00 +0000

All of these may be obvious improvements, but the thing is that
there is nothing in the approxidate parsing code that insists on the
presence of "T" to loosen the rule only for ISO-8601 case.

For example, with only 6 digits, do we still recognise our internal
timestamp format (i.e. seconds since epoch) without the
disambiguating '@' prefix?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: allow ISO 8601 reduced precision times
  2022-12-16  4:23 ` Junio C Hamano
@ 2022-12-16 18:38   ` Phil Hord
  2023-01-09  6:41     ` Phil Hord
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Hord @ 2022-12-16 18:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, congdanhqx, plavarre

On Thu, Dec 15, 2022 at 8:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phil Hord <phil.hord@gmail.com> writes:
>
> > From: Phil Hord <phil.hord@gmail.com>
> >
> > ISO 8601 permits "reduced precision" time representations to omit the
> > seconds value or both the minutes and the seconds values.  The
> > abbreviate times could look like 17:45 or 1745 to omit the seconds,
> > or simply as 17 to omit both the minutes and the seconds.
> >
> > parse_date_basic accepts the 17:45 format but it rejects the other two.
> > Fix it to accept 4-digit and 2-digit time values when they follow a
> > recognized date but no time has yet been parsed.
>
> I worry a bit that this may conflict with other approxidate
> heuristics.

I share your concern. I tried to make the ISO matching code very
specific to the case where we have already matched a date, we have not
yet matched a time value, and we see a lone 2-digit or 4-digit number
show up.

> > $ test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
> > 2022-12-13T23:00 -> 2022-12-14 07:00:00 +0000
> > 2022-12-13T2300 -> 2022-12-14 07:00:00 +0000
> > 2022-12-13T23 -> 2022-12-14 07:00:00 +0000
>
> All of these may be obvious improvements, but the thing is that
> there is nothing in the approxidate parsing code that insists on the
> presence of "T" to loosen the rule only for ISO-8601 case.

I considered making the T an explicit marker, but it didn't seem
necessary here. But the looseness of approxidate with regard to spaces
is worrisome. That's why I added the date/no-time constraints.

> For example, with only 6 digits, do we still recognise our internal
> timestamp format (i.e. seconds since epoch) without the
> disambiguating '@' prefix?

I don't grok your example.  This change should not affect the
interpretation of any 6-digit number.

Oh, do you mean if there was _no_ delimiter before the time field?
Like 2022-12-132300?  My change will not recognize this format, and I
believe it was explicitly rejected by ISO-8601-1:2019.

approxidate seems not to recognize fewer than 9 digits as an epoch
number, even with the @ prefix.  But this is not because of my change.

test-tool date approxidate 123456789 12345678
123456789 -> 1973-11-29 21:33:09 +0000
12345678 -> 2022-12-16 18:34:02 +0000

test-tool date approxidate @123456789 @12345678
@123456789 -> 1973-11-29 21:33:09 +0000
@12345678 -> 2022-12-16 18:36:35 +0000

test-tool date parse 123456789 12345678
123456789 -> 1973-11-29 13:33:09 -0800
12345678 -> bad

test-tool date parse @123456789 @12345678
@123456789 -> 1973-11-29 13:33:09 -0800
@12345678 -> bad

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: allow ISO 8601 reduced precision times
  2022-12-16 18:38   ` Phil Hord
@ 2023-01-09  6:41     ` Phil Hord
  2023-01-09  8:48       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Hord @ 2023-01-09  6:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, congdanhqx, plavarre

> On Thu, Dec 15, 2022 at 8:23 PM Junio C Hamano <gitster@pobox.com> wrote:
> > All of these may be obvious improvements, but the thing is that
> > there is nothing in the approxidate parsing code that insists on the
> > presence of "T" to loosen the rule only for ISO-8601 case.
>
> I considered making the T an explicit marker, but it didn't seem
> necessary here. But the looseness of approxidate with regard to spaces
> is worrisome. That's why I added the date/no-time constraints.
>
> > For example, with only 6 digits, do we still recognise our internal
> > timestamp format (i.e. seconds since epoch) without the
> > disambiguating '@' prefix?
>
> I don't grok your example.  This change should not affect the
> interpretation of any 6-digit number.
>
> Oh, do you mean if there was _no_ delimiter before the time field?
> Like 2022-12-132300?  My change will not recognize this format, and I
> believe it was explicitly rejected by ISO-8601-1:2019.
>
> approxidate seems not to recognize fewer than 9 digits as an epoch
> number, even with the @ prefix.  But this is not because of my change.
>
> test-tool date approxidate 123456789 12345678
> 123456789 -> 1973-11-29 21:33:09 +0000
> 12345678 -> 2022-12-16 18:34:02 +0000
>
> test-tool date approxidate @123456789 @12345678
> @123456789 -> 1973-11-29 21:33:09 +0000
> @12345678 -> 2022-12-16 18:36:35 +0000
>
> test-tool date parse 123456789 12345678
> 123456789 -> 1973-11-29 13:33:09 -0800
> 12345678 -> bad
>
> test-tool date parse @123456789 @12345678
> @123456789 -> 1973-11-29 13:33:09 -0800
> @12345678 -> bad


Do you have any suggestions about how I can better alleviate your
concerns?  I don't think there are real regressions here and I tried
to explain why.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: allow ISO 8601 reduced precision times
  2023-01-09  6:41     ` Phil Hord
@ 2023-01-09  8:48       ` Junio C Hamano
  2023-01-09  9:16         ` Junio C Hamano
  2023-01-09 11:29         ` [PATCH] fixup! " Đoàn Trần Công Danh
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2023-01-09  8:48 UTC (permalink / raw)
  To: Phil Hord; +Cc: git, congdanhqx, plavarre

Phil Hord <phil.hord@gmail.com> writes:

> Do you have any suggestions about how I can better alleviate your
> concerns?  I don't think there are real regressions here and I tried
> to explain why.

Other than "including it in a released version and waiting for
people to scream", I do not think there is.  The "next" branch was
meant to be a test ground for these new features by letting
volunteer users to use it in their everyday development, and the
hope was that we can catch regressions by cooking risky topics
longer than usual in there, but we haven't been very successful, I
have to say.

Thanks.  Let's queue it and see what happens.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: allow ISO 8601 reduced precision times
  2023-01-09  8:48       ` Junio C Hamano
@ 2023-01-09  9:16         ` Junio C Hamano
  2023-01-09 18:30           ` Phil Hord
  2023-01-09 11:29         ` [PATCH] fixup! " Đoàn Trần Công Danh
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2023-01-09  9:16 UTC (permalink / raw)
  To: Phil Hord; +Cc: git, congdanhqx, plavarre

Junio C Hamano <gitster@pobox.com> writes:

> Phil Hord <phil.hord@gmail.com> writes:
>
>> Do you have any suggestions about how I can better alleviate your
>> concerns?  I don't think there are real regressions here and I tried
>> to explain why.
>
> Other than "including it in a released version and waiting for
> people to scream", I do not think there is.  The "next" branch was
> meant to be a test ground for these new features by letting
> volunteer users to use it in their everyday development, and the
> hope was that we can catch regressions by cooking risky topics
> longer than usual in there, but we haven't been very successful, I
> have to say.
>
> Thanks.  Let's queue it and see what happens.

Actually, let's not queue it as-is, because it seems to break many
tests for me.  I won't have time to take further look myself before
later in the week when I come back online again, though.

Test Summary Report
-------------------
t4255-am-submodule.sh                            (Wstat: 256 (exited 1) Tests: 33 Failed: 22)
  Failed tests:  1-6, 11-13, 15-20, 25-27, 30-33
  Non-zero exit status: 1
t4150-am.sh                                      (Wstat: 256 (exited 1) Tests: 87 Failed: 62)
  Failed tests:  3, 5-7, 11-13, 15-16, 18-22, 24-25, 27-32
                34-36, 38-46, 48, 50-52, 54, 57-61, 63-65
                67-77, 82-85
  Non-zero exit status: 1
t4014-format-patch.sh                            (Wstat: 256 (exited 1) Tests: 193 Failed: 20)
  Failed tests:  6-7, 10, 141-157
  Non-zero exit status: 1
t7512-status-help.sh                             (Wstat: 256 (exited 1) Tests: 44 Failed: 1)
  Failed test:  26
  Non-zero exit status: 1
t3901-i18n-patch.sh                              (Wstat: 256 (exited 1) Tests: 20 Failed: 5)
  Failed tests:  16-20
  Non-zero exit status: 1
t4151-am-abort.sh                                (Wstat: 256 (exited 1) Tests: 20 Failed: 7)
  Failed tests:  2-3, 5-6, 15, 19-20
  Non-zero exit status: 1
t4153-am-resume-override-opts.sh                 (Wstat: 256 (exited 1) Tests: 5 Failed: 2)
  Failed tests:  3-4
  Non-zero exit status: 1
t5607-clone-bundle.sh                            (Wstat: 256 (exited 1) Tests: 14 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t4152-am-subjects.sh                             (Wstat: 256 (exited 1) Tests: 13 Failed: 9)
  Failed tests:  5-13
  Non-zero exit status: 1
t4253-am-keep-cr-dos.sh                          (Wstat: 256 (exited 1) Tests: 7 Failed: 4)
  Failed tests:  3-4, 6-7
  Non-zero exit status: 1
t4257-am-interactive.sh                          (Wstat: 256 (exited 1) Tests: 4 Failed: 3)
  Failed tests:  2-4
  Non-zero exit status: 1
t4258-am-quoted-cr.sh                            (Wstat: 256 (exited 1) Tests: 4 Failed: 2)
  Failed tests:  3-4
  Non-zero exit status: 1
t4254-am-corrupt.sh                              (Wstat: 256 (exited 1) Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t0023-crlf-am.sh                                 (Wstat: 256 (exited 1) Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
t4256-am-format-flowed.sh                        (Wstat: 256 (exited 1) Tests: 2 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
Files=987, Tests=28346, 137 wallclock secs (12.84 usr  4.19 sys + 799.53 cusr 1017.07 csys = 1833.63 CPU)
Result: FAIL
gmake[1]: *** [Makefile:62: prove] Error 1
gmake[1]: Leaving directory '/home/gitster/w/buildfarm/seen/t'
gmake: *** [Makefile:3196: test] Error 2
rmdir: failed to remove '/dev/shm/testpen.2086794': Directory not empty

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] fixup! date.c: allow ISO 8601 reduced precision times
  2023-01-09  8:48       ` Junio C Hamano
  2023-01-09  9:16         ` Junio C Hamano
@ 2023-01-09 11:29         ` Đoàn Trần Công Danh
  2023-01-09 12:29           ` [PATCH] date.c: limit less precision ISO-8601 with its marker Đoàn Trần Công Danh
  1 sibling, 1 reply; 12+ messages in thread
From: Đoàn Trần Công Danh @ 2023-01-09 11:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phil Hord, git, plavarre

On 2023-01-09 17:48:01+0900, Junio C Hamano <gitster@pobox.com> wrote:
> Phil Hord <phil.hord@gmail.com> writes:
> 
> > Do you have any suggestions about how I can better alleviate your
> > concerns?  I don't think there are real regressions here and I tried
> > to explain why.
> 
> Other than "including it in a released version and waiting for
> people to scream", I do not think there is.  The "next" branch was
> meant to be a test ground for these new features by letting
> volunteer users to use it in their everyday development, and the
> hope was that we can catch regressions by cooking risky topics
> longer than usual in there, but we haven't been very successful, I
> have to say.

While I think we shouldn't care much about ISO-8601, we should declare
that we're only conformed to RFC-3339 format instead.

Below fixup could limit the change to only ISO-8601 strings
I'm not entirely sure if this heuristics would break those people with
00:00:00.1234 timestamp or not (the added test cases shows that this
change doesn't break ISO-8601 parsing, but I don't know).

On top of Hord's patch + Junio's next, all tests pass.
----8<----

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 date.c          | 21 +++++++++++++--------
 t/t0006-date.sh |  3 ++-
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/date.c b/date.c
index b011b9d6b3..19e6787aef 100644
--- a/date.c
+++ b/date.c
@@ -493,6 +493,12 @@ static int match_alpha(const char *date, struct tm *tm, int *offset)
 		return 2;
 	}
 
+	/* ISO-8601 allows yyyymmDD'T'HHMMSS, with less precision */
+	if (*date == 'T' && isdigit(date[1])) {
+		tm->tm_hour = tm->tm_min = tm->tm_sec = 0;
+		return strlen("T");
+	}
+
 	/* BAD CRAP */
 	return skip_alpha(date);
 }
@@ -639,15 +645,14 @@ static inline int nodate(struct tm *tm)
 }
 
 /*
- * Have we filled in any part of the time yet?
- * We just do a binary 'and' to see if the sign bit
- * is set in all the values.
+ * Have we seen an ISO-8601-alike date, i.e. 20220101T0,
+ * In those special case, those fields have been set to 0
  */
-static inline int notime(struct tm *tm)
+static inline int maybeiso8601(struct tm *tm)
 {
-	return (tm->tm_hour &
-		tm->tm_min &
-		tm->tm_sec) < 0;
+	return tm->tm_hour == 0 &&
+		tm->tm_min == 0 &&
+		tm->tm_sec == 0;
 }
 
 /*
@@ -704,7 +709,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 	/* 4 digits, compact style of ISO-8601's time: HHMM */
 	/* 2 digits, compact style of ISO-8601's time: HH */
 	if (n == 8 || n == 6 ||
-		(!nodate(tm) && notime(tm) &&
+		(!nodate(tm) && maybeiso8601(tm) &&
 		(n == 4 || n == 2))) {
 		unsigned int num1 = num / 10000;
 		unsigned int num2 = (num % 10000) / 100;
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 16fb0bf4bd..130207fc04 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -93,7 +93,8 @@ check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
 check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
 check_parse '20080214T203045' '2008-02-14 20:30:45 +0000'
 check_parse '20080214T2030' '2008-02-14 20:30:00 +0000'
-check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
+check_parse '20080214T000000.20' '2008-02-14 00:00:00 +0000'
+check_parse '20080214T00:00:00.20' '2008-02-14 00:00:00 +0000'
 check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'

-- 
Danh

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] date.c: limit less precision ISO-8601 with its marker
  2023-01-09 11:29         ` [PATCH] fixup! " Đoàn Trần Công Danh
@ 2023-01-09 12:29           ` Đoàn Trần Công Danh
  2023-01-09 18:57             ` Phil Hord
  0 siblings, 1 reply; 12+ messages in thread
From: Đoàn Trần Công Danh @ 2023-01-09 12:29 UTC (permalink / raw)
  To: git, Phil Hord, plavarre, Junio C Hamano
  Cc: Đoàn Trần Công Danh

The newly added heuristic to parse less precision ISO-8601 conflicts
with other heuristics to parse datetime-strings. E.g.:

	Thu, 7 Apr 2005 15:14:13 -0700

Let's limit the new heuristic to only datetime string with a 'T'
followed immediately by some digits, and if we failed to parse the
upcoming string, rollback the change.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---

Here is a better thought out change, which tried to minimize the impact of
new heuristics.

While I think it's a fixup, but I still needs explaination, I think I may
reword it's as a full patch instead.
Range-diff:
1:  4036e5a944 ! 1:  b703425a57 fixup! date.c: allow ISO 8601 reduced precision times
    @@ Metadata
     Author: Đoàn Trần Công Danh <congdanhqx@gmail.com>
     
      ## Commit message ##
    -    fixup! date.c: allow ISO 8601 reduced precision times
    +    date.c: limit less precision ISO-8601 with its marker
    +
    +    The newly added heuristic to parse less precision ISO-8601 conflicts
    +    with other heuristics to parse datetime-strings. E.g.:
    +
    +            Thu, 7 Apr 2005 15:14:13 -0700
    +
    +    Let's limit the new heuristic to only datetime string with a 'T'
    +    followed immediately by some digits, and if we failed to parse the
    +    upcoming string, rollback the change.
     
         Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
     
    @@ date.c: static int match_alpha(const char *date, struct tm *tm, int *offset)
      	}
      
     +	/* ISO-8601 allows yyyymmDD'T'HHMMSS, with less precision */
    -+	if (*date == 'T' && isdigit(date[1])) {
    -+		tm->tm_hour = tm->tm_min = tm->tm_sec = 0;
    -+		return strlen("T");
    ++	if (*date == 'T' && isdigit(date[1]) && tm->tm_hour == -1) {
    ++		tm->tm_min = tm->tm_sec = 0;
    ++		return 1;
     +	}
     +
      	/* BAD CRAP */
    @@ date.c: static inline int nodate(struct tm *tm)
     - * We just do a binary 'and' to see if the sign bit
     - * is set in all the values.
     + * Have we seen an ISO-8601-alike date, i.e. 20220101T0,
    -+ * In those special case, those fields have been set to 0
    ++ * In which, hour is still unset,
    ++ * and minutes and second has been set to 0.
       */
     -static inline int notime(struct tm *tm)
     +static inline int maybeiso8601(struct tm *tm)
    @@ date.c: static inline int nodate(struct tm *tm)
     -	return (tm->tm_hour &
     -		tm->tm_min &
     -		tm->tm_sec) < 0;
    -+	return tm->tm_hour == 0 &&
    ++	return tm->tm_hour == -1 &&
     +		tm->tm_min == 0 &&
     +		tm->tm_sec == 0;
      }
      
      /*
     @@ date.c: static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
    - 	/* 4 digits, compact style of ISO-8601's time: HHMM */
    - 	/* 2 digits, compact style of ISO-8601's time: HH */
    - 	if (n == 8 || n == 6 ||
    + 
    + 	/* 8 digits, compact style of ISO-8601's date: YYYYmmDD */
    + 	/* 6 digits, compact style of ISO-8601's time: HHMMSS */
    +-	/* 4 digits, compact style of ISO-8601's time: HHMM */
    +-	/* 2 digits, compact style of ISO-8601's time: HH */
    +-	if (n == 8 || n == 6 ||
     -		(!nodate(tm) && notime(tm) &&
    -+		(!nodate(tm) && maybeiso8601(tm) &&
    - 		(n == 4 || n == 2))) {
    +-		(n == 4 || n == 2))) {
    ++	if (n == 8 || n == 6) {
      		unsigned int num1 = num / 10000;
      		unsigned int num2 = (num % 10000) / 100;
    + 		unsigned int num3 = num % 100;
    +@@ date.c: static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
    + 		else if (n == 6 && set_time(num1, num2, num3, tm) == 0 &&
    + 			 *end == '.' && isdigit(end[1]))
    + 			strtoul(end + 1, &end, 10);
    +-		else if (n == 4)
    +-			set_time(num2, num3, 0, tm);
    +-		else if (n == 2)
    +-			set_time(num3, 0, 0, tm);
    + 		return end - date;
    + 	}
    + 
    ++	/* reduced precision of ISO-8601's time: HHMM or HH */
    ++	if (maybeiso8601(tm)) {
    ++		unsigned int num1 = num;
    ++		unsigned int num2 = 0;
    ++		if (n == 4) {
    ++			num1 = num / 100;
    ++			num2 = num % 100;
    ++		}
    ++		if ((n == 4 || n == 2) && !nodate(tm) &&
    ++		    set_time(num1, num2, 0, tm) == 0)
    ++			return n;
    ++		/*
    ++		 * We thought this is an ISO-8601 time string,
    ++		 * we set minutes and seconds to 0,
    ++		 * turn out it isn't, rollback the change.
    ++		 */
    ++		tm->tm_min = tm->tm_sec = -1;
    ++	}
    ++
    + 	/* Four-digit year or a timezone? */
    + 	if (n == 4) {
    + 		if (num <= 1400 && *offset == -1) {
     
      ## t/t0006-date.sh ##
     @@ t/t0006-date.sh: check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'

 date.c          | 49 +++++++++++++++++++++++++++++++++----------------
 t/t0006-date.sh |  3 ++-
 2 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/date.c b/date.c
index b011b9d6b3..6f45eeb356 100644
--- a/date.c
+++ b/date.c
@@ -493,6 +493,12 @@ static int match_alpha(const char *date, struct tm *tm, int *offset)
 		return 2;
 	}
 
+	/* ISO-8601 allows yyyymmDD'T'HHMMSS, with less precision */
+	if (*date == 'T' && isdigit(date[1]) && tm->tm_hour == -1) {
+		tm->tm_min = tm->tm_sec = 0;
+		return 1;
+	}
+
 	/* BAD CRAP */
 	return skip_alpha(date);
 }
@@ -639,15 +645,15 @@ static inline int nodate(struct tm *tm)
 }
 
 /*
- * Have we filled in any part of the time yet?
- * We just do a binary 'and' to see if the sign bit
- * is set in all the values.
+ * Have we seen an ISO-8601-alike date, i.e. 20220101T0,
+ * In which, hour is still unset,
+ * and minutes and second has been set to 0.
  */
-static inline int notime(struct tm *tm)
+static inline int maybeiso8601(struct tm *tm)
 {
-	return (tm->tm_hour &
-		tm->tm_min &
-		tm->tm_sec) < 0;
+	return tm->tm_hour == -1 &&
+		tm->tm_min == 0 &&
+		tm->tm_sec == 0;
 }
 
 /*
@@ -701,11 +707,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 
 	/* 8 digits, compact style of ISO-8601's date: YYYYmmDD */
 	/* 6 digits, compact style of ISO-8601's time: HHMMSS */
-	/* 4 digits, compact style of ISO-8601's time: HHMM */
-	/* 2 digits, compact style of ISO-8601's time: HH */
-	if (n == 8 || n == 6 ||
-		(!nodate(tm) && notime(tm) &&
-		(n == 4 || n == 2))) {
+	if (n == 8 || n == 6) {
 		unsigned int num1 = num / 10000;
 		unsigned int num2 = (num % 10000) / 100;
 		unsigned int num3 = num % 100;
@@ -714,13 +716,28 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 		else if (n == 6 && set_time(num1, num2, num3, tm) == 0 &&
 			 *end == '.' && isdigit(end[1]))
 			strtoul(end + 1, &end, 10);
-		else if (n == 4)
-			set_time(num2, num3, 0, tm);
-		else if (n == 2)
-			set_time(num3, 0, 0, tm);
 		return end - date;
 	}
 
+	/* reduced precision of ISO-8601's time: HHMM or HH */
+	if (maybeiso8601(tm)) {
+		unsigned int num1 = num;
+		unsigned int num2 = 0;
+		if (n == 4) {
+			num1 = num / 100;
+			num2 = num % 100;
+		}
+		if ((n == 4 || n == 2) && !nodate(tm) &&
+		    set_time(num1, num2, 0, tm) == 0)
+			return n;
+		/*
+		 * We thought this is an ISO-8601 time string,
+		 * we set minutes and seconds to 0,
+		 * turn out it isn't, rollback the change.
+		 */
+		tm->tm_min = tm->tm_sec = -1;
+	}
+
 	/* Four-digit year or a timezone? */
 	if (n == 4) {
 		if (num <= 1400 && *offset == -1) {
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 16fb0bf4bd..130207fc04 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -93,7 +93,8 @@ check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
 check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
 check_parse '20080214T203045' '2008-02-14 20:30:45 +0000'
 check_parse '20080214T2030' '2008-02-14 20:30:00 +0000'
-check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
+check_parse '20080214T000000.20' '2008-02-14 00:00:00 +0000'
+check_parse '20080214T00:00:00.20' '2008-02-14 00:00:00 +0000'
 check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'
-- 
2.39.0.287.g690a66fa66


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: allow ISO 8601 reduced precision times
  2023-01-09  9:16         ` Junio C Hamano
@ 2023-01-09 18:30           ` Phil Hord
  0 siblings, 0 replies; 12+ messages in thread
From: Phil Hord @ 2023-01-09 18:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, congdanhqx, plavarre

On Mon, Jan 9, 2023 at 1:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Phil Hord <phil.hord@gmail.com> writes:
> >
> >> Do you have any suggestions about how I can better alleviate your
> >> concerns?  I don't think there are real regressions here and I tried
> >> to explain why.
> >
> > Other than "including it in a released version and waiting for
> > people to scream", I do not think there is.  The "next" branch was
> > meant to be a test ground for these new features by letting
> > volunteer users to use it in their everyday development, and the
> > hope was that we can catch regressions by cooking risky topics
> > longer than usual in there, but we haven't been very successful, I
> > have to say.
> >
> > Thanks.  Let's queue it and see what happens.
>
> Actually, let's not queue it as-is, because it seems to break many
> tests for me.  I won't have time to take further look myself before
> later in the week when I come back online again, though.

Oh, wow. For me as well.  I thought I ran all the tests before
finishing up, but I guess I was too focused on the single test module.
I apologize for the oversight.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] date.c: limit less precision ISO-8601 with its marker
  2023-01-09 12:29           ` [PATCH] date.c: limit less precision ISO-8601 with its marker Đoàn Trần Công Danh
@ 2023-01-09 18:57             ` Phil Hord
  0 siblings, 0 replies; 12+ messages in thread
From: Phil Hord @ 2023-01-09 18:57 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: git, plavarre, Junio C Hamano

On Mon, Jan 9, 2023 at 4:29 AM Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
>
> The newly added heuristic to parse less precision ISO-8601 conflicts
> with other heuristics to parse datetime-strings. E.g.:
>
>         Thu, 7 Apr 2005 15:14:13 -0700
>
> Let's limit the new heuristic to only datetime string with a 'T'
> followed immediately by some digits, and if we failed to parse the
> upcoming string, rollback the change.
>
> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
> ---
>
> Here is a better thought out change, which tried to minimize the impact of
> new heuristics.
>
> While I think it's a fixup, but I still needs explaination, I think I may
> reword it's as a full patch instead.
> Range-diff:
> 1:  4036e5a944 ! 1:  b703425a57 fixup! date.c: allow ISO 8601 reduced precision times
>     @@ Metadata
>      Author: Đoàn Trần Công Danh <congdanhqx@gmail.com>
>
>       ## Commit message ##
>     -    fixup! date.c: allow ISO 8601 reduced precision times
>     +    date.c: limit less precision ISO-8601 with its marker
>     +
>     +    The newly added heuristic to parse less precision ISO-8601 conflicts
>     +    with other heuristics to parse datetime-strings. E.g.:
>     +
>     +            Thu, 7 Apr 2005 15:14:13 -0700
>     +
>     +    Let's limit the new heuristic to only datetime string with a 'T'
>     +    followed immediately by some digits, and if we failed to parse the
>     +    upcoming string, rollback the change.
>
>          Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
>
>     @@ date.c: static int match_alpha(const char *date, struct tm *tm, int *offset)
>         }
>
>      +  /* ISO-8601 allows yyyymmDD'T'HHMMSS, with less precision */
>     -+  if (*date == 'T' && isdigit(date[1])) {
>     -+          tm->tm_hour = tm->tm_min = tm->tm_sec = 0;
>     -+          return strlen("T");
>     ++  if (*date == 'T' && isdigit(date[1]) && tm->tm_hour == -1) {
>     ++          tm->tm_min = tm->tm_sec = 0;
>     ++          return 1;
>      +  }
>      +
>         /* BAD CRAP */
>     @@ date.c: static inline int nodate(struct tm *tm)
>      - * We just do a binary 'and' to see if the sign bit
>      - * is set in all the values.
>      + * Have we seen an ISO-8601-alike date, i.e. 20220101T0,
>     -+ * In those special case, those fields have been set to 0
>     ++ * In which, hour is still unset,
>     ++ * and minutes and second has been set to 0.
>        */
>      -static inline int notime(struct tm *tm)
>      +static inline int maybeiso8601(struct tm *tm)
>     @@ date.c: static inline int nodate(struct tm *tm)
>      -  return (tm->tm_hour &
>      -          tm->tm_min &
>      -          tm->tm_sec) < 0;
>     -+  return tm->tm_hour == 0 &&
>     ++  return tm->tm_hour == -1 &&
>      +          tm->tm_min == 0 &&
>      +          tm->tm_sec == 0;
>       }
>
>       /*
>      @@ date.c: static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
>     -   /* 4 digits, compact style of ISO-8601's time: HHMM */
>     -   /* 2 digits, compact style of ISO-8601's time: HH */
>     -   if (n == 8 || n == 6 ||
>     +
>     +   /* 8 digits, compact style of ISO-8601's date: YYYYmmDD */
>     +   /* 6 digits, compact style of ISO-8601's time: HHMMSS */
>     +-  /* 4 digits, compact style of ISO-8601's time: HHMM */
>     +-  /* 2 digits, compact style of ISO-8601's time: HH */
>     +-  if (n == 8 || n == 6 ||
>      -          (!nodate(tm) && notime(tm) &&
>     -+          (!nodate(tm) && maybeiso8601(tm) &&
>     -           (n == 4 || n == 2))) {
>     +-          (n == 4 || n == 2))) {
>     ++  if (n == 8 || n == 6) {
>                 unsigned int num1 = num / 10000;
>                 unsigned int num2 = (num % 10000) / 100;
>     +           unsigned int num3 = num % 100;
>     +@@ date.c: static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
>     +           else if (n == 6 && set_time(num1, num2, num3, tm) == 0 &&
>     +                    *end == '.' && isdigit(end[1]))
>     +                   strtoul(end + 1, &end, 10);
>     +-          else if (n == 4)
>     +-                  set_time(num2, num3, 0, tm);
>     +-          else if (n == 2)
>     +-                  set_time(num3, 0, 0, tm);
>     +           return end - date;
>     +   }
>     +
>     ++  /* reduced precision of ISO-8601's time: HHMM or HH */
>     ++  if (maybeiso8601(tm)) {
>     ++          unsigned int num1 = num;
>     ++          unsigned int num2 = 0;
>     ++          if (n == 4) {
>     ++                  num1 = num / 100;
>     ++                  num2 = num % 100;
>     ++          }
>     ++          if ((n == 4 || n == 2) && !nodate(tm) &&
>     ++              set_time(num1, num2, 0, tm) == 0)
>     ++                  return n;
>     ++          /*
>     ++           * We thought this is an ISO-8601 time string,
>     ++           * we set minutes and seconds to 0,
>     ++           * turn out it isn't, rollback the change.
>     ++           */
>     ++          tm->tm_min = tm->tm_sec = -1;
>     ++  }
>     ++
>     +   /* Four-digit year or a timezone? */
>     +   if (n == 4) {
>     +           if (num <= 1400 && *offset == -1) {
>
>       ## t/t0006-date.sh ##
>      @@ t/t0006-date.sh: check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
>
>  date.c          | 49 +++++++++++++++++++++++++++++++++----------------
>  t/t0006-date.sh |  3 ++-
>  2 files changed, 35 insertions(+), 17 deletions(-)
>
> diff --git a/date.c b/date.c
> index b011b9d6b3..6f45eeb356 100644
> --- a/date.c
> +++ b/date.c
> @@ -493,6 +493,12 @@ static int match_alpha(const char *date, struct tm *tm, int *offset)
>                 return 2;
>         }
>
> +       /* ISO-8601 allows yyyymmDD'T'HHMMSS, with less precision */
> +       if (*date == 'T' && isdigit(date[1]) && tm->tm_hour == -1) {
> +               tm->tm_min = tm->tm_sec = 0;
> +               return 1;
> +       }
> +
>         /* BAD CRAP */
>         return skip_alpha(date);
>  }
> @@ -639,15 +645,15 @@ static inline int nodate(struct tm *tm)
>  }
>
>  /*
> - * Have we filled in any part of the time yet?
> - * We just do a binary 'and' to see if the sign bit
> - * is set in all the values.
> + * Have we seen an ISO-8601-alike date, i.e. 20220101T0,
> + * In which, hour is still unset,
> + * and minutes and second has been set to 0.
>   */
> -static inline int notime(struct tm *tm)
> +static inline int maybeiso8601(struct tm *tm)
>  {
> -       return (tm->tm_hour &
> -               tm->tm_min &
> -               tm->tm_sec) < 0;
> +       return tm->tm_hour == -1 &&
> +               tm->tm_min == 0 &&
> +               tm->tm_sec == 0;
>  }
>
>  /*
> @@ -701,11 +707,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
>
>         /* 8 digits, compact style of ISO-8601's date: YYYYmmDD */
>         /* 6 digits, compact style of ISO-8601's time: HHMMSS */
> -       /* 4 digits, compact style of ISO-8601's time: HHMM */
> -       /* 2 digits, compact style of ISO-8601's time: HH */
> -       if (n == 8 || n == 6 ||
> -               (!nodate(tm) && notime(tm) &&
> -               (n == 4 || n == 2))) {
> +       if (n == 8 || n == 6) {
>                 unsigned int num1 = num / 10000;
>                 unsigned int num2 = (num % 10000) / 100;
>                 unsigned int num3 = num % 100;
> @@ -714,13 +716,28 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
>                 else if (n == 6 && set_time(num1, num2, num3, tm) == 0 &&
>                          *end == '.' && isdigit(end[1]))
>                         strtoul(end + 1, &end, 10);
> -               else if (n == 4)
> -                       set_time(num2, num3, 0, tm);
> -               else if (n == 2)
> -                       set_time(num3, 0, 0, tm);
>                 return end - date;
>         }
>
> +       /* reduced precision of ISO-8601's time: HHMM or HH */
> +       if (maybeiso8601(tm)) {
> +               unsigned int num1 = num;
> +               unsigned int num2 = 0;
> +               if (n == 4) {
> +                       num1 = num / 100;
> +                       num2 = num % 100;
> +               }
> +               if ((n == 4 || n == 2) && !nodate(tm) &&
> +                   set_time(num1, num2, 0, tm) == 0)
> +                       return n;
> +               /*
> +                * We thought this is an ISO-8601 time string,
> +                * we set minutes and seconds to 0,
> +                * turn out it isn't, rollback the change.
> +                */
> +               tm->tm_min = tm->tm_sec = -1;
> +       }
> +
>         /* Four-digit year or a timezone? */
>         if (n == 4) {
>                 if (num <= 1400 && *offset == -1) {
> diff --git a/t/t0006-date.sh b/t/t0006-date.sh
> index 16fb0bf4bd..130207fc04 100755
> --- a/t/t0006-date.sh
> +++ b/t/t0006-date.sh
> @@ -93,7 +93,8 @@ check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
>  check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
>  check_parse '20080214T203045' '2008-02-14 20:30:45 +0000'
>  check_parse '20080214T2030' '2008-02-14 20:30:00 +0000'
> -check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
> +check_parse '20080214T000000.20' '2008-02-14 00:00:00 +0000'
> +check_parse '20080214T00:00:00.20' '2008-02-14 00:00:00 +0000'
>  check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400'
>  check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400'
>  check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'
> --
> 2.39.0.287.g690a66fa66
>

Thanks, Đoàn.  LGTM, and much safer.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] date.c: allow ISO 8601 reduced precision times
  2022-12-16  3:36 [PATCH] date.c: allow ISO 8601 reduced precision times Phil Hord
  2022-12-16  4:23 ` Junio C Hamano
@ 2023-01-11  0:10 ` Đoàn Trần Công Danh
  2023-01-13 19:50   ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Đoàn Trần Công Danh @ 2023-01-11  0:10 UTC (permalink / raw)
  To: git
  Cc: Đoàn Trần Công Danh, Phil Hord, plavarre,
	Junio C Hamano

ISO 8601 permits "reduced precision" time representations to omit the
seconds value or both the minutes and the seconds values.  The
abbreviate times could look like 17:45 or 1745 to omit the seconds,
or simply as 17 to omit both the minutes and the seconds.

parse_date_basic accepts the 17:45 format but it rejects the other two.
Change it to accept 4-digit and 2-digit time values when they follow a
recognized date and a 'T'.

Before this change:

$ TZ=UTC test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-13 23:00:00 +0000
2022-12-13T2300 -> 2022-12-13 23:54:13 +0000
2022-12-13T23 -> 2022-12-13 23:54:13 +0000

After this change:

$ TZ=UTC helper/test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-13 23:00:00 +0000
2022-12-13T2300 -> 2022-12-13 23:00:00 +0000
2022-12-13T23 -> 2022-12-13 23:00:00 +0000

Note: ISO 8601 also allows reduced precision date strings such as
"2022-12" and "2022". This patch does not attempt to address these.

Reported-by: Pat LaVarre <plavarre@purestorage.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---

Since this is a complete re-implementation from Phil Hord's version.
I'm reassigning the author to me.

This version change the implementation to only treat the string as ISO8601 if
a 'T' existed and date has been parsed. I also added a test for parsing
RFC-822, which Hord accidentally broke.

The commit message has been changed:
* The example has been changed to be independent from local timezone
* Remove the mention of adding test-cases, since it's obviously necessary.

 date.c          | 37 +++++++++++++++++++++++++++++++++++++
 t/t0006-date.sh |  8 ++++++++
 2 files changed, 45 insertions(+)

diff --git a/date.c b/date.c
index 53bd6a7932..6f45eeb356 100644
--- a/date.c
+++ b/date.c
@@ -493,6 +493,12 @@ static int match_alpha(const char *date, struct tm *tm, int *offset)
 		return 2;
 	}
 
+	/* ISO-8601 allows yyyymmDD'T'HHMMSS, with less precision */
+	if (*date == 'T' && isdigit(date[1]) && tm->tm_hour == -1) {
+		tm->tm_min = tm->tm_sec = 0;
+		return 1;
+	}
+
 	/* BAD CRAP */
 	return skip_alpha(date);
 }
@@ -638,6 +644,18 @@ static inline int nodate(struct tm *tm)
 		tm->tm_sec) < 0;
 }
 
+/*
+ * Have we seen an ISO-8601-alike date, i.e. 20220101T0,
+ * In which, hour is still unset,
+ * and minutes and second has been set to 0.
+ */
+static inline int maybeiso8601(struct tm *tm)
+{
+	return tm->tm_hour == -1 &&
+		tm->tm_min == 0 &&
+		tm->tm_sec == 0;
+}
+
 /*
  * We've seen a digit. Time? Year? Date?
  */
@@ -701,6 +719,25 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt
 		return end - date;
 	}
 
+	/* reduced precision of ISO-8601's time: HHMM or HH */
+	if (maybeiso8601(tm)) {
+		unsigned int num1 = num;
+		unsigned int num2 = 0;
+		if (n == 4) {
+			num1 = num / 100;
+			num2 = num % 100;
+		}
+		if ((n == 4 || n == 2) && !nodate(tm) &&
+		    set_time(num1, num2, 0, tm) == 0)
+			return n;
+		/*
+		 * We thought this is an ISO-8601 time string,
+		 * we set minutes and seconds to 0,
+		 * turn out it isn't, rollback the change.
+		 */
+		tm->tm_min = tm->tm_sec = -1;
+	}
+
 	/* Four-digit year or a timezone? */
 	if (n == 4) {
 		if (num <= 1400 && *offset == -1) {
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 2490162071..e18b160286 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -88,6 +88,13 @@ check_parse 2008-02-14 bad
 check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 +0000'
 check_parse '2008-02-14 20:30:45 -0500' '2008-02-14 20:30:45 -0500'
 check_parse '2008.02.14 20:30:45 -0500' '2008-02-14 20:30:45 -0500'
+check_parse '20080214T20:30:45' '2008-02-14 20:30:45 +0000'
+check_parse '20080214T20:30' '2008-02-14 20:30:00 +0000'
+check_parse '20080214T20' '2008-02-14 20:00:00 +0000'
+check_parse '20080214T203045' '2008-02-14 20:30:45 +0000'
+check_parse '20080214T2030' '2008-02-14 20:30:00 +0000'
+check_parse '20080214T000000.20' '2008-02-14 00:00:00 +0000'
+check_parse '20080214T00:00:00.20' '2008-02-14 00:00:00 +0000'
 check_parse '20080214T203045-04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045 -04:00' '2008-02-14 20:30:45 -0400'
 check_parse '20080214T203045.019-04:00' '2008-02-14 20:30:45 -0400'
@@ -99,6 +106,7 @@ check_parse '2008-02-14 20:30:45 -05' '2008-02-14 20:30:45 -0500'
 check_parse '2008-02-14 20:30:45 -:30' '2008-02-14 20:30:45 +0000'
 check_parse '2008-02-14 20:30:45 -05:00' '2008-02-14 20:30:45 -0500'
 check_parse '2008-02-14 20:30:45' '2008-02-14 20:30:45 -0500' EST5
+check_parse 'Thu, 7 Apr 2005 15:14:13 -0700' '2005-04-07 15:14:13 -0700'
 
 check_approxidate() {
 	echo "$1 -> $2 +0000" >expect
-- 
2.39.0.287.g690a66fa66


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] date.c: allow ISO 8601 reduced precision times
  2023-01-11  0:10 ` [PATCH v2] date.c: allow ISO 8601 reduced precision times Đoàn Trần Công Danh
@ 2023-01-13 19:50   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2023-01-13 19:50 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: git, Phil Hord, plavarre

Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> ISO 8601 permits "reduced precision" time representations to omit the
> seconds value or both the minutes and the seconds values.  The
> abbreviate times could look like 17:45 or 1745 to omit the seconds,
> or simply as 17 to omit both the minutes and the seconds.
>
> parse_date_basic accepts the 17:45 format but it rejects the other two.
> Change it to accept 4-digit and 2-digit time values when they follow a
> recognized date and a 'T'.
>
> Before this change:
>
> $ TZ=UTC test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
> 2022-12-13T23:00 -> 2022-12-13 23:00:00 +0000
> 2022-12-13T2300 -> 2022-12-13 23:54:13 +0000
> 2022-12-13T23 -> 2022-12-13 23:54:13 +0000
>
> After this change:
>
> $ TZ=UTC helper/test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
> 2022-12-13T23:00 -> 2022-12-13 23:00:00 +0000
> 2022-12-13T2300 -> 2022-12-13 23:00:00 +0000
> 2022-12-13T23 -> 2022-12-13 23:00:00 +0000
>
> Note: ISO 8601 also allows reduced precision date strings such as
> "2022-12" and "2022". This patch does not attempt to address these.
>
> Reported-by: Pat LaVarre <plavarre@purestorage.com>
> Signed-off-by: Phil Hord <phil.hord@gmail.com>
> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
> ---
>
> Since this is a complete re-implementation from Phil Hord's version.
> I'm reassigning the author to me.
>
> This version change the implementation to only treat the string as ISO8601 if
> a 'T' existed and date has been parsed. I also added a test for parsing
> RFC-822, which Hord accidentally broke.
>
> The commit message has been changed:
> * The example has been changed to be independent from local timezone
> * Remove the mention of adding test-cases, since it's obviously necessary.

Will replace Phil's patch with this (hence even under your
authorship, the topic name will be reused).

Thanks, both.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-13 19:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16  3:36 [PATCH] date.c: allow ISO 8601 reduced precision times Phil Hord
2022-12-16  4:23 ` Junio C Hamano
2022-12-16 18:38   ` Phil Hord
2023-01-09  6:41     ` Phil Hord
2023-01-09  8:48       ` Junio C Hamano
2023-01-09  9:16         ` Junio C Hamano
2023-01-09 18:30           ` Phil Hord
2023-01-09 11:29         ` [PATCH] fixup! " Đoàn Trần Công Danh
2023-01-09 12:29           ` [PATCH] date.c: limit less precision ISO-8601 with its marker Đoàn Trần Công Danh
2023-01-09 18:57             ` Phil Hord
2023-01-11  0:10 ` [PATCH v2] date.c: allow ISO 8601 reduced precision times Đoàn Trần Công Danh
2023-01-13 19:50   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).