git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Thomas Bock <bockthom@cs.uni-saarland.de>
Cc: Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2 0/3] fixing some parse_commit() timestamp corner cases
Date: Tue, 25 Apr 2023 01:54:42 -0400	[thread overview]
Message-ID: <20230425055442.GA4015600@coredump.intra.peff.net> (raw)
In-Reply-To: <20230425055244.GA4014505@coredump.intra.peff.net>

On Tue, Apr 25, 2023 at 01:52:45AM -0400, Jeff King wrote:

> Here's a v2 of my series. The behavior should be identical, but I've
> incorporated some comment and small code tweaks based on feedback from
> the first round.
> 
> I also added a fourth patch which adds a new comment explaining some of
> the cases that were alluded to in the earlier round's patch 3.
> 
>   [1/4]: t4212: avoid putting git on left-hand side of pipe
>   [2/4]: parse_commit(): parse timestamp from end of line
>   [3/4]: parse_commit(): handle broken whitespace-only timestamp
>   [4/4]: parse_commit(): describe more date-parsing failure modes
> 
>  commit.c               | 47 +++++++++++++++++++++++++++++++++++-------
>  t/t4212-log-corrupt.sh | 39 +++++++++++++++++++++++++++++++++--
>  2 files changed, 76 insertions(+), 10 deletions(-)

Whoops, forgot my range-diff (though nothing should be too surprising
based on the round 1 discussion):

1:  07932cf666 = 1:  ac38ce133d t4212: avoid putting git on left-hand side of pipe
2:  7ee34c7d5f ! 2:  f59e61262d parse_commit(): parse timestamp from end of line
    @@ Commit message
         parse back to the final ">". In theory we could use split_ident_line()
         here, but it's actually a bit more strict. In particular, it requires a
         valid time-zone token, too. That should be present, of course, but we
    -    wouldn't want to break --until for malformed cases that are working
    -    currently.
    +    wouldn't want to break --until for cases that are working currently.
     
         We might want to teach split_ident_line() to become more lenient there,
         but it would require checking its many callers (since right now they can
    @@ commit.c: static timestamp_t parse_commit_date(const char *buf, const char *tail
     -	if (buf >= tail)
     +
     +	/*
    -+	 * parse to end-of-line and then walk backwards, which
    -+	 * handles some malformed cases.
    ++	 * Jump to end-of-line so that we can walk backwards to find the
    ++	 * end-of-email ">". This is more forgiving of malformed cases
    ++	 * because unexpected characters tend to be in the name and email
    ++	 * fields.
     +	 */
     +	eol = memchr(buf, '\n', tail - buf);
     +	if (!eol)
      		return 0;
     -	dateptr = buf;
     -	while (buf < tail && *buf++ != '\n')
    -+	for (dateptr = eol; dateptr > buf && dateptr[-1] != '>'; dateptr--)
    - 		/* nada */;
    +-		/* nada */;
     -	if (buf >= tail)
    ++	dateptr = eol;
    ++	while (dateptr > buf && dateptr[-1] != '>')
    ++		dateptr--;
     +	if (dateptr == buf || dateptr == eol)
      		return 0;
     -	/* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */
3:  e8e94083f5 ! 3:  c62fc59bf1 parse_commit(): handle broken whitespace-only timestamp
    @@ Commit message
         It's not subject to the same bug, because it insists that there be one
         or more digits in the timestamp.
     
    -    We can use the same logic here. If there's a non-whitespace but
    -    non-digit value (say "committer name <email> foo"), then
    -    parse_timestamp() would already have returned 0 anyway. So the only
    -    change should be for this "whitespace only" case.
    -
         Signed-off-by: Jeff King <peff@peff.net>
     
      ## commit.c ##
     @@ commit.c: static timestamp_t parse_commit_date(const char *buf, const char *tail)
    - 	if (dateptr == buf || dateptr == eol)
    + 	dateptr = eol;
    + 	while (dateptr > buf && dateptr[-1] != '>')
    + 		dateptr--;
    +-	if (dateptr == buf || dateptr == eol)
    ++	if (dateptr == buf)
      		return 0;
      
    +-	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
     +	/*
    -+	 * trim leading whitespace; parse_timestamp() will do this itself, but
    -+	 * it will walk past the newline at eol while doing so. So we insist
    -+	 * that there is at least one digit here.
    ++	 * Trim leading whitespace; parse_timestamp() will do this itself, but
    ++	 * if we have _only_ whitespace, it will walk right past the newline
    ++	 * while doing so.
     +	 */
     +	while (dateptr < eol && isspace(*dateptr))
     +		dateptr++;
    -+	if (!strchr("0123456789", *dateptr))
    ++	if (dateptr == eol)
     +		return 0;
     +
    - 	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
    ++	/*
    ++	 * We know there is at least one non-whitespace character, so we'll
    ++	 * begin parsing there and stop at worst case at eol.
    ++	 */
      	return parse_timestamp(dateptr, NULL, 10);
      }
    + 
     
      ## t/t4212-log-corrupt.sh ##
     @@ t/t4212-log-corrupt.sh: test_expect_success 'absurdly far-in-future date' '
-:  ---------- > 4:  28ed51a2ca parse_commit(): describe more date-parsing failure modes

  reply	other threads:[~2023-04-25  5:55 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 11:37 Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980 Thomas Bock
2023-04-15  8:52 ` Jeff King
2023-04-15  8:59   ` Jeff King
2023-04-15 14:10   ` Kristoffer Haugsbakk
2023-04-17  5:40     ` Jeff King
2023-04-17  6:20       ` Kristoffer Haugsbakk
2023-04-17  7:41         ` Jeff King
2023-04-27 22:32           ` Kristoffer Haugsbakk
2023-04-17  9:51   ` Junio C Hamano
2023-04-18  4:12     ` Jeff King
2023-04-18 14:02       ` Derrick Stolee
2023-04-21 14:51         ` Thomas Bock
2023-04-22 13:41           ` [PATCH 0/3] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-22 13:42             ` [PATCH 1/3] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-22 13:47             ` [PATCH 2/3] parse_commit(): parse timestamp from end of line Jeff King
2023-04-24 17:05               ` Junio C Hamano
2023-04-25  5:23                 ` Jeff King
2023-04-24 16:39             ` [PATCH 0/3] fixing some parse_commit() timestamp corner cases Junio C Hamano
2023-04-25  5:52             ` [PATCH v2 " Jeff King
2023-04-25  5:54               ` Jeff King [this message]
2023-04-25  5:54               ` [PATCH v2 1/4] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-25  5:54               ` [PATCH v2 2/4] parse_commit(): parse timestamp from end of line Jeff King
2023-04-25  5:54               ` [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-25 10:11                 ` Phillip Wood
2023-04-25 16:06                   ` Junio C Hamano
2023-04-26 11:36                     ` Jeff King
2023-04-26 15:32                       ` Junio C Hamano
2023-04-27  8:13                         ` [PATCH v3 0/4] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-27  8:14                           ` [PATCH v3 1/4] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-27  8:14                           ` [PATCH v3 2/4] parse_commit(): parse timestamp from end of line Jeff King
2023-04-27  8:17                           ` [PATCH v3 3/4] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-27 10:11                             ` Phillip Wood
2023-04-27 11:55                               ` Phillip Wood
2023-04-27 16:46                                 ` Jeff King
2023-04-27 16:20                               ` Junio C Hamano
2023-04-27 16:55                                 ` Jeff King
2023-04-27 16:25                             ` Junio C Hamano
2023-04-27 16:57                               ` Jeff King
2023-04-27  8:17                           ` [PATCH v3 4/4] parse_commit(): describe more date-parsing failure modes Jeff King
2023-04-27  8:18                           ` [PATCH v3 0/4] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-27 16:32                           ` Junio C Hamano
2023-04-26 14:06                     ` [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp Phillip Wood
2023-04-26 14:31                       ` Andreas Schwab
2023-04-26 14:44                         ` Phillip Wood
2023-04-25  5:55               ` [PATCH v2 4/4] parse_commit(): describe more date-parsing failure modes Jeff King
2023-04-22 13:52         ` Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980 Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230425055442.GA4015600@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=bockthom@cs.uni-saarland.de \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).