user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: [PATCH] MsgTime.pm: Use strptime to compute the time zone
Date: Fri, 06 Jul 2018 16:32:15 -0500	[thread overview]
Message-ID: <87efggt5i8.fsf_-_@xmission.com> (raw)
In-Reply-To: <87o9flt496.fsf@xmission.com> (Eric W. Biederman's message of "Thu, 05 Jul 2018 22:47:01 -0500")


Recently I had trouble cloning lkml/git/0.git because
git fsck on receive was failing.  The output of git fsck was:
> Checking object directories: 100% (256/256), done.
> warning in commit 59173dc1fe67b113ace4ce83e7f522414b3e0404: badTimezone: invalid author/committer line - bad time zone
> warning in commit ff22aaff22eb4479e49e93f697e385f76db51c55: badTimezone: invalid author/committer line - bad time zone
> warning in commit 609b744909693f5f00aff5ed9928beeeee9ded2e: badTimezone: invalid author/committer line - bad time zone
> warning in commit 084572141db8e0d879428afb278bd338f2dbb053: badTimezone: invalid author/committer line - bad time zone
> warning in commit 789d204de27cd12c6da693d903390a241a1a4bca: badTimezone: invalid author/committer line - bad time zone
> warning in commit 0d9a65948b0c957007ca387cd56b690f9bab9c08: badTimezone: invalid author/committer line - bad time zone
> warning in commit f7468c42b4196ee6323afb373ab9323971c38d69: badTimezone: invalid author/committer line - bad time zone
> warning in commit 85e0cd6dd527cd55ad0440f14384529b83818228: badTimezone: invalid author/committer line - bad time zone
> warning in commit f31e19a2e772c9ed00728ef142af9c550ea5de6a: badTimezone: invalid author/committer line - bad time zone
> warning in commit 56eb7384443ef84e17e29504a304a071b189ae67: badTimezone: invalid author/committer line - bad time zone
> warning in commit e4470030471e6810414b9de5e3b52e16f2245d12: badTimezone: invalid author/committer line - bad time zone
> warning in commit f913b48caa097c3b2cb3f491707944f88d52d89f: badTimezone: invalid author/committer line - bad time zone
> warning in commit 4390f26923d572c6dab6cce8282c7cad5520d785: badTimezone: invalid author/committer line - bad time zone
> warning in commit 0f66db71a06bd7d651a0cd80877d8043b70fda20: badTimezone: invalid author/committer line - bad time zone
> warning in commit d71472c40b36dcdf0396afc9778f6137eea45887: badTimezone: invalid author/committer line - bad time zone
> warning in commit e8d3b19a91a2d86b6a91bd19dc811e851398b519: badTimezone: invalid author/committer line - bad time zone
> warning in commit afd9fc0cc87e56ed7736d633e17d0ef77817b3cc: badTimezone: invalid author/committer line - bad time zone
> warning in commit 811b3217708358cf1b75fba4602a64a426fce0f5: badTimezone: invalid author/committer line - bad time zone
> warning in commit e7a751a597c6f5e4770c61bdee6220d55a37cba9: badTimezone: invalid author/committer line - bad time zone
> warning in commit 3e32ad6192fe093e03e6b9346c3a90b16d9905c0: badTimezone: invalid author/committer line - bad time zone
> warning in commit 5e66b47528e79d3bbb769e137f036a1fa99cccf9: badTimezone: invalid author/committer line - bad time zone
> warning in commit d90d67d94ca47142670dff13fcb81ab7afab07bb: badTimezone: invalid author/committer line - bad time zone
> Checking objects: 100% (1711464/1711464), done.
> Checking connectivity: 1711464, done.

Upon examination with git show --pretty=raw all of the problem commits
had a time zone that was not 4 digits long.  This time zone had been
passed straight from the Date line in the email into the author line
of the commit.

Looking into that I discovered that str2time takes into account the
time zone, and was actually able to process these weird time zones.

So get the normalized time zone with strptime and convert it from
seconds from gmt to hours and minutes from gmt.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

 lib/PublicInbox/MsgTime.pm | 41 ++++++++++--------
 t/msgtime.t                | 87 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+), 17 deletions(-)
 create mode 100644 t/msgtime.t

diff --git a/lib/PublicInbox/MsgTime.pm b/lib/PublicInbox/MsgTime.pm
index c67a41fff2ef..f3ebb6447a9c 100644
--- a/lib/PublicInbox/MsgTime.pm
+++ b/lib/PublicInbox/MsgTime.pm
@@ -5,19 +5,31 @@ use strict;
 use warnings;
 use base qw(Exporter);
 our @EXPORT_OK = qw(msg_timestamp msg_datestamp);
-use Date::Parse qw(str2time);
-use Time::Zone qw(tz_offset);
+use Date::Parse qw(str2time strptime);
+
+sub str2date_zone ($) {
+	my ($date) = @_;
+
+	my $ts = str2time($date);
+	return undef unless(defined $ts);
+
+	# off is the time zone offset in seconds from GMT
+	my ($ss,$mm,$hh,$day,$month,$year,$off) = strptime($date);
+	return undef unless(defined $off);
+
+	# Compute the time zone from offset
+	my $sign = ($off < 0) ? '-' : '+';
+	my $hour = abs(int($off / 3600));
+	my $min  = ($off / 60) % 60;
+	my $zone = sprintf('%s%02d%02d', $sign, $hour, $min);
 
-sub zone_clamp ($) {
-	my ($zone) = @_;
-	$zone ||= '+0000';
 	# "-1200" is the furthest westermost zone offset,
 	# but git fast-import is liberal so we use "-1400"
 	if ($zone >= 1400 || $zone <= -1400) {
 		warn "bogus TZ offset: $zone, ignoring and assuming +0000\n";
 		$zone = '+0000';
 	}
-	$zone;
+	[$ts, $zone];
 }
 
 sub time_response ($) {
@@ -28,37 +40,32 @@ sub time_response ($) {
 sub msg_received_at ($) {
 	my ($hdr) = @_; # Email::MIME::Header
 	my @recvd = $hdr->header_raw('Received');
-	my ($ts, $zone);
+	my ($ts);
 	foreach my $r (@recvd) {
-		$zone = undef;
 		$r =~ /\s*(\d+\s+[[:alpha:]]+\s+\d{2,4}\s+
 			\d+\D\d+(?:\D\d+)\s+([\+\-]\d+))/sx or next;
-		$zone = $2;
-		$ts = eval { str2time($1) } and last;
+		$ts = eval { str2date_zone($1) } and return $ts;
 		my $mid = $hdr->header_raw('Message-ID');
 		warn "no date in $mid Received: $r\n";
 	}
-	defined $ts ? [ $ts, zone_clamp($zone) ] : undef;
+	undef;
 }
 
 sub msg_date_only ($) {
 	my ($hdr) = @_; # Email::MIME::Header
 	my @date = $hdr->header_raw('Date');
-	my ($ts, $zone);
+	my ($ts);
 	foreach my $d (@date) {
-		$zone = undef;
 		# Y2K problems: 3-digit years
 		$d =~ s!([A-Za-z]{3}) (\d{3}) (\d\d:\d\d:\d\d)!
 			my $yyyy = $2 + 1900; "$1 $yyyy $3"!e;
-		$ts = eval { str2time($d) };
+		$ts = eval { str2date_zone($d) } and return $ts;
 		if ($@) {
 			my $mid = $hdr->header_raw('Message-ID');
 			warn "bad Date: $d in $mid: $@\n";
-		} elsif ($d =~ /\s+([\+\-]\d+)\s*\z/) {
-			$zone = $1;
 		}
 	}
-	defined $ts ? [ $ts, zone_clamp($zone) ] : undef;
+	undef;
 }
 
 # Favors Received header for sorting globally
diff --git a/t/msgtime.t b/t/msgtime.t
new file mode 100644
index 000000000000..c390670ae01f
--- /dev/null
+++ b/t/msgtime.t
@@ -0,0 +1,87 @@
+# Copyright (C) 2016-2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use PublicInbox::MIME;
+use PublicInbox::MsgTime;
+
+sub datestamp ($) {
+	my ($date) = @_;
+	local $SIG{__WARN__} = sub {};  # Suppress warnings
+	my $mime = PublicInbox::MIME->create(
+		header => [
+			From => 'a@example.com',
+			To => 'b@example.com',
+			'Content-Type' => 'text/plain',
+			Subject => 'this is a subject',
+			'Message-ID' => '<a@example.com>',
+			Date => $date,
+			'Received' => '(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932173AbXAVSQY (ORCPT <rfc822;w@1wt.eu>);\n\tMon, 22 Jan 2007 13:16:24 -0500',
+		],
+		body => "hello world\n",
+	    );
+	my @ts = PublicInbox::MsgTime::msg_datestamp($mime->header_obj);
+	return \@ts;
+}
+
+sub timestamp ($) {
+	my ($received) = @_;
+	local $SIG{__WARN__} = sub {};  # Suppress warnings
+	my $mime = PublicInbox::MIME->create(
+		header => [
+			From => 'a@example.com',
+			To => 'b@example.com',
+			'Content-Type' => 'text/plain',
+			Subject => 'this is a subject',
+			'Message-ID' => '<a@example.com>',
+			Date => 'Fri, 02 Oct 1993 00:00:00 +0000',
+			'Received' => '(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932173AbXAVSQY (ORCPT <rfc822;w@1wt.eu>);\n\t' . $received,
+		],
+		body => "hello world\n",
+	    );
+	my @ts = PublicInbox::MsgTime::msg_timestamp($mime->header_obj);
+	return \@ts;
+}
+
+# Verify that the parser sucks up the timezone for dates
+for (my $min = -1440; $min <= 1440; $min += 30) {
+	my $sign = ($min < 0) ? '-': '+';
+	my $h = abs(int($min / 60));
+	my $m = $min % 60;
+
+	my $ts_expect = 749520000 - ($min * 60);
+	my $tz_expect = sprintf('%s%02d%02d', $sign, $h, $m);
+	if ($tz_expect >= 1400 || $tz_expect <= -1400) {
+		$tz_expect = '+0000';
+	}
+	my $date = sprintf("Fri, 02 Oct 1993 00:00:00 %s%02d%02d",
+			   $sign, $h, $m);
+	my $result = datestamp($date);
+	is_deeply($result, [ $ts_expect, $tz_expect ]);
+}
+
+# Verify that the parser sucks up the timezone and for received timestamps
+for (my $min = -1440; $min <= 1440; $min += 30) {
+	my $sign = ($min < 0) ? '-' : '+';
+	my $h = abs(int($min / 60));
+	my $m = $min %60;
+
+	my $ts_expect = 1169471784 - ($min * 60);
+	my $tz_expect = sprintf('%s%02d%02d', $sign, $h, $m);
+	if ($tz_expect >= 1400 || $tz_expect <= -1400) {
+		$tz_expect = '+0000';
+	}
+	my $received = sprintf('Mon, 22 Jan 2007 13:16:24 %s%02d%02d',
+			       $sign, $h, $m);
+	is_deeply(timestamp($received), [ $ts_expect, $tz_expect ]);
+}
+
+is_deeply(datestamp('Wed, 13 Dec 2006 10:26:38 +1'), [1166001998, '+0100']);
+is_deeply(datestamp('Fri, 3 Feb 2006 18:11:22 -00'), [1138990282, '+0000']);
+is_deeply(datestamp('Thursday, 20 Feb 2003 01:14:34 +000'), [1045703674, '+0000']);
+is_deeply(datestamp('Fri, 28 Jun 2002 12:54:40 -700'), [1025294080, '-0700']);
+is_deeply(datestamp('Sat, 12 Jan 2002 12:52:57 -200'), [1010847177, '-0200']);
+is_deeply(datestamp('Mon, 05 Nov 2001 10:36:16 -800'), [1004985376, '-0800']);
+
+done_testing();
-- 
2.17.1


  reply	other threads:[~2018-07-06 21:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05  5:40 Warnings from git fsck after lkml import Eric W. Biederman
2018-07-05 23:13 ` Eric Wong
2018-07-06  0:36   ` Eric W. Biederman
2018-07-06  3:47     ` Eric W. Biederman
2018-07-06 21:32       ` Eric W. Biederman [this message]
2018-07-06 22:22         ` [PATCH] MsgTime.pm: Use strptime to compute the time zone Eric Wong
2018-07-07 18:18           ` Eric W. Biederman
2018-07-07 18:22           ` [PATCH] Import: Don't copy nulls from emails into git Eric W. Biederman
2018-07-08  0:07             ` Eric Wong
2018-07-08  1:52               ` Eric W. Biederman
2018-07-12 18:31   ` Warnings from git fsck after lkml import Konstantin Ryabitsev
2018-07-12 22:19     ` Eric W. Biederman
2018-07-12 22:29     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efggt5i8.fsf_-_@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).