* PATCH: Improved support for ISO 8601 timezones @ 2010-05-17 19:07 Marcus Comstedt 2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt 2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt 0 siblings, 2 replies; 6+ messages in thread From: Marcus Comstedt @ 2010-05-17 19:07 UTC (permalink / raw To: git Hi. I discovered that git's date parser does not understand "Z" to mean the "UTC" timezone. This is unfortunate, because the use of "Z" is prescribed by ISO 8601. I made a small patch to add "Z" as an alias for "UTC", which enables standard ISO 8601 timestamps to be parsed correctly. Also, it fixes a bug that at least three characters of the timezone name had to match, which is of course impossible when the name of the timezone is shorter than three characters. There was already such a timezone before ("NT") which could not be selected due to the bug. The second patch, which is perhaps less essential, adds support for the remaining numerical timezone indicators defined by ISO 8601 not already supported by git (only +-hhmm was supported, but ISO 8601 also specifies that +-hh:mm and +-hh are ok as well). Thanks // Marcus ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" 2010-05-17 19:07 PATCH: Improved support for ISO 8601 timezones Marcus Comstedt @ 2010-05-17 19:07 ` Marcus Comstedt 2010-05-17 20:32 ` Jay Soffian 2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt 1 sibling, 1 reply; 6+ messages in thread From: Marcus Comstedt @ 2010-05-17 19:07 UTC (permalink / raw To: git; +Cc: Marcus Comstedt The name "Z" for the UTC timezone is required to properly parse ISO 8601 times. Added it to the list of recignozed timezones. Also, fixed the bug that timezone names shorter than 3 characters can never be matched by match_alpha(). Prior to the introduction of the "Z" zone, this affected the timezone "NT" (Nome). Signed-off-by: Marcus Comstedt <marcus@mc.pp.se> --- :100644 100644 002aa3c... 6bae49c... M date.c date.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/date.c b/date.c index 002aa3c..6bae49c 100644 --- a/date.c +++ b/date.c @@ -229,6 +229,7 @@ static const struct { { "GMT", 0, 0, }, /* Greenwich Mean */ { "UTC", 0, 0, }, /* Universal (Coordinated) */ + { "Z", 0, 0, }, /* Zulu, alias for UTC */ { "WET", 0, 0, }, /* Western European */ { "BST", 0, 1, }, /* British Summer */ @@ -305,7 +306,7 @@ static int match_alpha(const char *date, struct tm *tm, int *offset) for (i = 0; i < ARRAY_SIZE(timezone_names); i++) { int match = match_string(date, timezone_names[i].name); - if (match >= 3) { + if (match >= 3 || match == strlen(timezone_names[i].name)) { int off = timezone_names[i].offset; /* This is bogus, but we like summer */ -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" 2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt @ 2010-05-17 20:32 ` Jay Soffian 0 siblings, 0 replies; 6+ messages in thread From: Jay Soffian @ 2010-05-17 20:32 UTC (permalink / raw To: Marcus Comstedt; +Cc: git On Mon, May 17, 2010 at 3:07 PM, Marcus Comstedt <marcus@mc.pp.se> wrote: > The name "Z" for the UTC timezone is required to properly parse > ISO 8601 times. Added it to the list of recignozed timezones. s/Added/Add/; s/recignozed/recognized/ > Also, fixed the bug that timezone names shorter than 3 characters s/fixed the/fix a/ j. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm 2010-05-17 19:07 PATCH: Improved support for ISO 8601 timezones Marcus Comstedt 2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt @ 2010-05-17 19:07 ` Marcus Comstedt 2010-05-19 14:31 ` Junio C Hamano 1 sibling, 1 reply; 6+ messages in thread From: Marcus Comstedt @ 2010-05-17 19:07 UTC (permalink / raw To: git; +Cc: Marcus Comstedt ISO 8601 specifies three syntaxes for timezones other than "Z". git already supports the +-hhmm syntax. This patch adds support for the other two: +-hh:mm and +-hh. Signed-off-by: Marcus Comstedt <marcus@mc.pp.se> --- :100644 100644 6bae49c... f83e46e... M date.c date.c | 23 +++++++++++++++++++++++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/date.c b/date.c index 6bae49c..f83e46e 100644 --- a/date.c +++ b/date.c @@ -555,6 +555,18 @@ static int match_tz(const char *date, int *offp) int min, hour; int n = end - date - 1; + /* Check for HH:MM format, allowed by ISO 8601 */ + if (n == 2 && date[3] == ':') { + char *end2; + min = strtoul(date+4, &end2, 10); + /* If we have two digits after the colon too, assume HH:MM */ + if (end2 == date+6) { + offset = offset*100 + min; + end = end2; + n = end - date - 1; + } + } + min = offset % 100; hour = offset / 100; @@ -570,6 +582,17 @@ static int match_tz(const char *date, int *offp) *offp = offset; } + /* + * Also accept just the hour, allowed by ISO 8601 + */ + else if (n == 2 && hour == 0 && min < 24) { + offset = min*60; + if (*date == '-') + offset = -offset; + + *offp = offset; + } + return end - date; } -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm 2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt @ 2010-05-19 14:31 ` Junio C Hamano 2010-05-19 17:21 ` Marcus Comstedt 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2010-05-19 14:31 UTC (permalink / raw To: Marcus Comstedt; +Cc: git Marcus Comstedt <marcus@mc.pp.se> writes: > ISO 8601 specifies three syntaxes for timezones other than "Z". > git already supports the +-hhmm syntax. This patch adds support > for the other two: +-hh:mm and +-hh. > > Signed-off-by: Marcus Comstedt <marcus@mc.pp.se> > --- > :100644 100644 6bae49c... f83e46e... M date.c > date.c | 23 +++++++++++++++++++++++ > 1 files changed, 23 insertions(+), 0 deletions(-) > > diff --git a/date.c b/date.c > index 6bae49c..f83e46e 100644 > --- a/date.c > +++ b/date.c > @@ -555,6 +555,18 @@ static int match_tz(const char *date, int *offp) > int min, hour; > int n = end - date - 1; > > + /* Check for HH:MM format, allowed by ISO 8601 */ > + if (n == 2 && date[3] == ':') { > + char *end2; > + min = strtoul(date+4, &end2, 10); > + /* If we have two digits after the colon too, assume HH:MM */ > + if (end2 == date+6) { > + offset = offset*100 + min; > + end = end2; > + n = end - date - 1; > + } > + } > + > min = offset % 100; > hour = offset / 100; > > @@ -570,6 +582,17 @@ static int match_tz(const char *date, int *offp) > > *offp = offset; > } > + /* > + * Also accept just the hour, allowed by ISO 8601 > + */ > + else if (n == 2 && hour == 0 && min < 24) { > + offset = min*60; > + if (*date == '-') > + offset = -offset; > + > + *offp = offset; > + } > + I don't recall seeing in ISO 8601 that +hh or -hh without minute resolution was allowed, but I don't have my copy of ISO 8601 with me (they are packed and are still in transit with my household goods) so I'll take your word for it for now [*1*]. But the placement of this second hunk is somewhat curious. Why doesn't the updated function look like this? int offset = strtoul(date + 1, &end, 10); int min, hour; int n = end - date - 1; if (n == 2 && offset <= 14) { /* +HH:MM (ISO 8601) or +HH (ISO 8601 abbreviated) */ hour = offset; if (n == 2 && date[3] == ':') { min = strtoul(date + 4, &end, 10); if (end != date + 6) return 0; /* Bad CRAP */ } else { min = 0; } } else if (n < 3) { return 0; /* we want at least 3 digits */ } else { min = offset % 100; hour = offset / 100; } if (60 <= min) return 0; /* invalid minute */ offset = hour * 60 + min; if (*date == '-') offset = -offset; *offp = offset; return end - date; [Footnote] *1* Appendix A of RFC3339 seems to agree with you. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm 2010-05-19 14:31 ` Junio C Hamano @ 2010-05-19 17:21 ` Marcus Comstedt 0 siblings, 0 replies; 6+ messages in thread From: Marcus Comstedt @ 2010-05-19 17:21 UTC (permalink / raw To: Junio C Hamano; +Cc: git Hi Junio. Thanks for reviewing this patch. Junio C Hamano <gitster@pobox.com> writes: > I don't recall seeing in ISO 8601 that +hh or -hh without minute > resolution was allowed, but I don't have my copy of ISO 8601 with me (they > are packed and are still in transit with my household goods) so I'll take > your word for it for now [*1*]. In the final draft of 8601:2000 (which is the only version I have), section 5.3.4.1 states that "[...] the representation of the difference can be expressed in hours and minutes, or hours only." Examples of this then follow in that section and the next one. Maybe they changed it in the final version (or it differs from another release of the standard)? I wish you could "git log -S" ISO standards... :-) Wikipedia also agrees that it is allowed by the standard though. > But the placement of this second hunk is somewhat curious. Why doesn't the > updated function look like this? [...] I was perhaps treading a bit over-cautiously. The placement allowed me to leave the existing code both syntactically and semantically unaltered. After all, there was nothing wrong with the old code per se, I was just adding new functionality. I also wanted the two changes independent, in case you wanted one but not the other. I can concede that your variant leaves a more appealing end result though. (Except for the fact that "n == 2" is needlessly tested in the inner if. ;) One thing though: Shouldn't 1 be returned for bad crap rather than 0? Seems to me parse_date will get stuck otherwise, because the sign will never be consumed. In fact, the old code would consume both the sign and the initial sequence of digits in the crap case. Consuming just the sign would leave the digits to be handled by match_digit, which may or may not regard it as non-crap. Good or bad, I don't know. But it might cause regressions. I'll play around a little with the code and perform some new unit tests, and then resubmit a new patch with the suggested structure. // Marcus ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-05-19 17:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-17 19:07 PATCH: Improved support for ISO 8601 timezones Marcus Comstedt 2010-05-17 19:07 ` [PATCH 1/2] Added "Z" as an alias for the timezone "UTC" Marcus Comstedt 2010-05-17 20:32 ` Jay Soffian 2010-05-17 19:07 ` [PATCH 2/2] Accept the timezone specifiers [+-]hh:mm and [+-]hh in addition to [+-]hhmm Marcus Comstedt 2010-05-19 14:31 ` Junio C Hamano 2010-05-19 17:21 ` Marcus Comstedt
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).