From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 64EB01F62D; Fri, 6 Jul 2018 21:32:31 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fbYL7-0005ki-VU; Fri, 06 Jul 2018 15:32:30 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fbYL5-00017X-Ji; Fri, 06 Jul 2018 15:32:29 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <87a7r6z1cy.fsf@xmission.com> <20180705231346.GA6524@dcvr> <87601turnf.fsf@xmission.com> <87o9flt496.fsf@xmission.com> Date: Fri, 06 Jul 2018 16:32:15 -0500 In-Reply-To: <87o9flt496.fsf@xmission.com> (Eric W. Biederman's message of "Thu, 05 Jul 2018 22:47:01 -0500") Message-ID: <87efggt5i8.fsf_-_@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fbYL5-00017X-Ji;;;mid=<87efggt5i8.fsf_-_@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+EjrDJ66Q6LIJO0E4X2NruVU/3UDkhj6w= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH] MsgTime.pm: Use strptime to compute the time zone X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) List-Id: Recently I had trouble cloning lkml/git/0.git because git fsck on receive was failing. The output of git fsck was: > Checking object directories: 100% (256/256), done. > warning in commit 59173dc1fe67b113ace4ce83e7f522414b3e0404: badTimezone: invalid author/committer line - bad time zone > warning in commit ff22aaff22eb4479e49e93f697e385f76db51c55: badTimezone: invalid author/committer line - bad time zone > warning in commit 609b744909693f5f00aff5ed9928beeeee9ded2e: badTimezone: invalid author/committer line - bad time zone > warning in commit 084572141db8e0d879428afb278bd338f2dbb053: badTimezone: invalid author/committer line - bad time zone > warning in commit 789d204de27cd12c6da693d903390a241a1a4bca: badTimezone: invalid author/committer line - bad time zone > warning in commit 0d9a65948b0c957007ca387cd56b690f9bab9c08: badTimezone: invalid author/committer line - bad time zone > warning in commit f7468c42b4196ee6323afb373ab9323971c38d69: badTimezone: invalid author/committer line - bad time zone > warning in commit 85e0cd6dd527cd55ad0440f14384529b83818228: badTimezone: invalid author/committer line - bad time zone > warning in commit f31e19a2e772c9ed00728ef142af9c550ea5de6a: badTimezone: invalid author/committer line - bad time zone > warning in commit 56eb7384443ef84e17e29504a304a071b189ae67: badTimezone: invalid author/committer line - bad time zone > warning in commit e4470030471e6810414b9de5e3b52e16f2245d12: badTimezone: invalid author/committer line - bad time zone > warning in commit f913b48caa097c3b2cb3f491707944f88d52d89f: badTimezone: invalid author/committer line - bad time zone > warning in commit 4390f26923d572c6dab6cce8282c7cad5520d785: badTimezone: invalid author/committer line - bad time zone > warning in commit 0f66db71a06bd7d651a0cd80877d8043b70fda20: badTimezone: invalid author/committer line - bad time zone > warning in commit d71472c40b36dcdf0396afc9778f6137eea45887: badTimezone: invalid author/committer line - bad time zone > warning in commit e8d3b19a91a2d86b6a91bd19dc811e851398b519: badTimezone: invalid author/committer line - bad time zone > warning in commit afd9fc0cc87e56ed7736d633e17d0ef77817b3cc: badTimezone: invalid author/committer line - bad time zone > warning in commit 811b3217708358cf1b75fba4602a64a426fce0f5: badTimezone: invalid author/committer line - bad time zone > warning in commit e7a751a597c6f5e4770c61bdee6220d55a37cba9: badTimezone: invalid author/committer line - bad time zone > warning in commit 3e32ad6192fe093e03e6b9346c3a90b16d9905c0: badTimezone: invalid author/committer line - bad time zone > warning in commit 5e66b47528e79d3bbb769e137f036a1fa99cccf9: badTimezone: invalid author/committer line - bad time zone > warning in commit d90d67d94ca47142670dff13fcb81ab7afab07bb: badTimezone: invalid author/committer line - bad time zone > Checking objects: 100% (1711464/1711464), done. > Checking connectivity: 1711464, done. Upon examination with git show --pretty=raw all of the problem commits had a time zone that was not 4 digits long. This time zone had been passed straight from the Date line in the email into the author line of the commit. Looking into that I discovered that str2time takes into account the time zone, and was actually able to process these weird time zones. So get the normalized time zone with strptime and convert it from seconds from gmt to hours and minutes from gmt. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/MsgTime.pm | 41 ++++++++++-------- t/msgtime.t | 87 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 111 insertions(+), 17 deletions(-) create mode 100644 t/msgtime.t diff --git a/lib/PublicInbox/MsgTime.pm b/lib/PublicInbox/MsgTime.pm index c67a41fff2ef..f3ebb6447a9c 100644 --- a/lib/PublicInbox/MsgTime.pm +++ b/lib/PublicInbox/MsgTime.pm @@ -5,19 +5,31 @@ use strict; use warnings; use base qw(Exporter); our @EXPORT_OK = qw(msg_timestamp msg_datestamp); -use Date::Parse qw(str2time); -use Time::Zone qw(tz_offset); +use Date::Parse qw(str2time strptime); + +sub str2date_zone ($) { + my ($date) = @_; + + my $ts = str2time($date); + return undef unless(defined $ts); + + # off is the time zone offset in seconds from GMT + my ($ss,$mm,$hh,$day,$month,$year,$off) = strptime($date); + return undef unless(defined $off); + + # Compute the time zone from offset + my $sign = ($off < 0) ? '-' : '+'; + my $hour = abs(int($off / 3600)); + my $min = ($off / 60) % 60; + my $zone = sprintf('%s%02d%02d', $sign, $hour, $min); -sub zone_clamp ($) { - my ($zone) = @_; - $zone ||= '+0000'; # "-1200" is the furthest westermost zone offset, # but git fast-import is liberal so we use "-1400" if ($zone >= 1400 || $zone <= -1400) { warn "bogus TZ offset: $zone, ignoring and assuming +0000\n"; $zone = '+0000'; } - $zone; + [$ts, $zone]; } sub time_response ($) { @@ -28,37 +40,32 @@ sub time_response ($) { sub msg_received_at ($) { my ($hdr) = @_; # Email::MIME::Header my @recvd = $hdr->header_raw('Received'); - my ($ts, $zone); + my ($ts); foreach my $r (@recvd) { - $zone = undef; $r =~ /\s*(\d+\s+[[:alpha:]]+\s+\d{2,4}\s+ \d+\D\d+(?:\D\d+)\s+([\+\-]\d+))/sx or next; - $zone = $2; - $ts = eval { str2time($1) } and last; + $ts = eval { str2date_zone($1) } and return $ts; my $mid = $hdr->header_raw('Message-ID'); warn "no date in $mid Received: $r\n"; } - defined $ts ? [ $ts, zone_clamp($zone) ] : undef; + undef; } sub msg_date_only ($) { my ($hdr) = @_; # Email::MIME::Header my @date = $hdr->header_raw('Date'); - my ($ts, $zone); + my ($ts); foreach my $d (@date) { - $zone = undef; # Y2K problems: 3-digit years $d =~ s!([A-Za-z]{3}) (\d{3}) (\d\d:\d\d:\d\d)! my $yyyy = $2 + 1900; "$1 $yyyy $3"!e; - $ts = eval { str2time($d) }; + $ts = eval { str2date_zone($d) } and return $ts; if ($@) { my $mid = $hdr->header_raw('Message-ID'); warn "bad Date: $d in $mid: $@\n"; - } elsif ($d =~ /\s+([\+\-]\d+)\s*\z/) { - $zone = $1; } } - defined $ts ? [ $ts, zone_clamp($zone) ] : undef; + undef; } # Favors Received header for sorting globally diff --git a/t/msgtime.t b/t/msgtime.t new file mode 100644 index 000000000000..c390670ae01f --- /dev/null +++ b/t/msgtime.t @@ -0,0 +1,87 @@ +# Copyright (C) 2016-2018 all contributors +# License: AGPL-3.0+ +use strict; +use warnings; +use Test::More; +use PublicInbox::MIME; +use PublicInbox::MsgTime; + +sub datestamp ($) { + my ($date) = @_; + local $SIG{__WARN__} = sub {}; # Suppress warnings + my $mime = PublicInbox::MIME->create( + header => [ + From => 'a@example.com', + To => 'b@example.com', + 'Content-Type' => 'text/plain', + Subject => 'this is a subject', + 'Message-ID' => '', + Date => $date, + 'Received' => '(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932173AbXAVSQY (ORCPT );\n\tMon, 22 Jan 2007 13:16:24 -0500', + ], + body => "hello world\n", + ); + my @ts = PublicInbox::MsgTime::msg_datestamp($mime->header_obj); + return \@ts; +} + +sub timestamp ($) { + my ($received) = @_; + local $SIG{__WARN__} = sub {}; # Suppress warnings + my $mime = PublicInbox::MIME->create( + header => [ + From => 'a@example.com', + To => 'b@example.com', + 'Content-Type' => 'text/plain', + Subject => 'this is a subject', + 'Message-ID' => '', + Date => 'Fri, 02 Oct 1993 00:00:00 +0000', + 'Received' => '(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932173AbXAVSQY (ORCPT );\n\t' . $received, + ], + body => "hello world\n", + ); + my @ts = PublicInbox::MsgTime::msg_timestamp($mime->header_obj); + return \@ts; +} + +# Verify that the parser sucks up the timezone for dates +for (my $min = -1440; $min <= 1440; $min += 30) { + my $sign = ($min < 0) ? '-': '+'; + my $h = abs(int($min / 60)); + my $m = $min % 60; + + my $ts_expect = 749520000 - ($min * 60); + my $tz_expect = sprintf('%s%02d%02d', $sign, $h, $m); + if ($tz_expect >= 1400 || $tz_expect <= -1400) { + $tz_expect = '+0000'; + } + my $date = sprintf("Fri, 02 Oct 1993 00:00:00 %s%02d%02d", + $sign, $h, $m); + my $result = datestamp($date); + is_deeply($result, [ $ts_expect, $tz_expect ]); +} + +# Verify that the parser sucks up the timezone and for received timestamps +for (my $min = -1440; $min <= 1440; $min += 30) { + my $sign = ($min < 0) ? '-' : '+'; + my $h = abs(int($min / 60)); + my $m = $min %60; + + my $ts_expect = 1169471784 - ($min * 60); + my $tz_expect = sprintf('%s%02d%02d', $sign, $h, $m); + if ($tz_expect >= 1400 || $tz_expect <= -1400) { + $tz_expect = '+0000'; + } + my $received = sprintf('Mon, 22 Jan 2007 13:16:24 %s%02d%02d', + $sign, $h, $m); + is_deeply(timestamp($received), [ $ts_expect, $tz_expect ]); +} + +is_deeply(datestamp('Wed, 13 Dec 2006 10:26:38 +1'), [1166001998, '+0100']); +is_deeply(datestamp('Fri, 3 Feb 2006 18:11:22 -00'), [1138990282, '+0000']); +is_deeply(datestamp('Thursday, 20 Feb 2003 01:14:34 +000'), [1045703674, '+0000']); +is_deeply(datestamp('Fri, 28 Jun 2002 12:54:40 -700'), [1025294080, '-0700']); +is_deeply(datestamp('Sat, 12 Jan 2002 12:52:57 -200'), [1010847177, '-0200']); +is_deeply(datestamp('Mon, 05 Nov 2001 10:36:16 -800'), [1004985376, '-0800']); + +done_testing(); -- 2.17.1