user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 5/4] msgtime: avoid obviously out-of-range dates (for now)
Date: Sun, 1 Dec 2019 22:04:25 +0000	[thread overview]
Message-ID: <20191201220425.GA30161@dcvr> (raw)
In-Reply-To: <20191129122508.7708-5-e@80x24.org>

Wacky dates show up in lore for valid messages.  Lets ignore
them and let future generations deal with Y10K and time-travel
problems.
---
 lib/PublicInbox/MsgTime.pm |  6 +++++-
 t/msgtime.t                | 14 ++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/MsgTime.pm b/lib/PublicInbox/MsgTime.pm
index 479aaa4ecf132..9f4326442dd11 100644
--- a/lib/PublicInbox/MsgTime.pm
+++ b/lib/PublicInbox/MsgTime.pm
@@ -38,7 +38,7 @@ sub str2date_zone ($) {
 	if ($date =~ /(?:[A-Za-z]+,?\s+)? # day-of-week
 			([0-9]+),?\s+  # dd
 			([A-Za-z]+)\s+ # mon
-			([0-9]{2,})\s+ # YYYY or YY (or YYY :P)
+			([0-9]{2,4})\s+ # YYYY or YY (or YYY :P)
 			([0-9]+)[:\.] # HH:
 				((?:[0-9]{2})|(?:\s?[0-9])) # MM
 				(?:[:\.]((?:[0-9]{2})|(?:\s?[0-9])))? # :SS
@@ -67,6 +67,10 @@ sub str2date_zone ($) {
 
 		$ts = timegm($ss // 0, $mm, $hh, $dd, $mon, $yyyy);
 
+		# 4-digit dates in non-spam from 1900s and 1910s exist in
+		# lore archives
+		return if $ts < 0;
+
 		# Compute the time offset from [+-]HHMM
 		$tz //= 0;
 		my ($tz_hh, $tz_mm);
diff --git a/t/msgtime.t b/t/msgtime.t
index 1452dc97d5b0b..cecad775769e1 100644
--- a/t/msgtime.t
+++ b/t/msgtime.t
@@ -5,7 +5,7 @@ use warnings;
 use Test::More;
 use PublicInbox::MIME;
 use PublicInbox::MsgTime;
-
+our $received_date = 'Mon, 22 Jan 2007 13:16:24 -0500';
 sub datestamp ($) {
 	my ($date) = @_;
 	local $SIG{__WARN__} = sub {};  # Suppress warnings
@@ -17,7 +17,11 @@ sub datestamp ($) {
 			Subject => 'this is a subject',
 			'Message-ID' => '<a@example.com>',
 			Date => $date,
-			'Received' => '(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932173AbXAVSQY (ORCPT <rfc822;w@1wt.eu>);\n\tMon, 22 Jan 2007 13:16:24 -0500',
+			'Received' => <<EOF,
+(majordomo\@vger.kernel.org) by vger.kernel.org via listexpand
+\tid S932173AbXAVSQY (ORCPT <rfc822;w@1wt.eu>);
+\t$received_date
+EOF
 		],
 		body => "hello world\n",
 	    );
@@ -104,4 +108,10 @@ for (qw(UT GMT Z)) {
 }
 is_datestamp('Fri, 02 Oct 1993 00:00:00 EDT', [ 749534400, '-0400']);
 
+# fallback to Received: header if Date: is out-of-range:
+is_datestamp('Fri, 1 Jan 1904 10:12:31 +0100',
+	PublicInbox::MsgTime::str2date_zone($received_date));
+is_datestamp('Fri, 9 Mar 71685 18:45:56 +0000', # Y10K is not my problem :P
+	PublicInbox::MsgTime::str2date_zone($received_date));
+
 done_testing();

  reply	other threads:[~2019-12-01 22:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29 12:25 [PATCH 0/4] drop Date::Parse dependency Eric Wong
2019-11-29 12:25 ` [PATCH 1/4] git: async batch interface Eric Wong
2019-11-29 12:25 ` [PATCH 2/4] add msgtime_cmp maintainer test Eric Wong
2019-11-29 12:25 ` [PATCH 3/4] msgtime: drop Date::Parse for RFC2822 Eric Wong
2019-11-29 12:25 ` [PATCH 4/4] Date::Parse is now optional Eric Wong
2019-12-01 22:04   ` Eric Wong [this message]
2019-12-12  3:42     ` [PATCH 5/4] msgtime: avoid obviously out-of-range dates (for now) Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191201220425.GA30161@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).