git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Bock <bockthom@cs.uni-saarland.de>, git@vger.kernel.org
Subject: Re: Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980
Date: Tue, 18 Apr 2023 00:12:53 -0400	[thread overview]
Message-ID: <20230418041253.GD60552@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqa5z6q1jl.fsf@gitster.g>

On Mon, Apr 17, 2023 at 02:51:42AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > There may be also cases where the two diverge. Obviously having two
> > parsers isn't ideal. I think sharing the code may involve a lot of work,
> > though. The pretty-print parser is interested in pulling out more
> > information, and is less focused on performance. Parsing commits is
> > traditionally a hot path, as we historically had to parse every commit,
> > even if we weren't showing it (including non-log operations like
> > computing merge bases, reachability, and so forth).
> >
> > But that may not matter so much. One, we already inflate the whole
> > commit object, not just the header. So even if we spend a few extra
> > instructions on parsing, it may not be noticeable. And two, these days
> > we often cache commit metadata in the commit-graph files, which avoids
> > loading the commit message entirely (and thus this parsing) for most
> > operations.
> 
> Makes readers wonder which parser is used to parse commit objects in
> order to populate the commit-graph files.  If that step screws up,
> we'd record a broken timestamp there X-<.

It should be be the parse_commit() one, since the commit-graph writing
code works off of parsed "struct commit" objects to find its data. Which
is good, since we may silently optimize out a parse in favor of the
commit-graph; we'd want the two to always match. So recording a broken
timestamp there is OK (but see below).

Where it gets trickier is if we then fix the parser to handle this case.
Now the commit-graph stores a broken value, and further writes are
clever enough to say "ah, I already have that commit in the graph; no
need to recompute its values". So once you start using a version of Git
with the more lenient parser, you'd have to manually blow away any graph
files and rebuild to see the benefit.

One thing the commit graph perhaps _could_ do is omit the commit, or
mark it as "this one is broken in some way". And then fall back to
parsing those few instead (which is slower, but if it's a small minority
of commits, that's OK). But I don't think there's any code for that. And
it's kind of tricky anyway, because the parser just sets the date to "0"
with no indication that there was any potential error. So it's probably
solvable, but perhaps not worth the effort. The current rule is just
"whatever we got from parsing is what the commit-graph will return",
which works well in practice unless the parser changes.

-Peff

  reply	other threads:[~2023-04-18  4:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 11:37 Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980 Thomas Bock
2023-04-15  8:52 ` Jeff King
2023-04-15  8:59   ` Jeff King
2023-04-15 14:10   ` Kristoffer Haugsbakk
2023-04-17  5:40     ` Jeff King
2023-04-17  6:20       ` Kristoffer Haugsbakk
2023-04-17  7:41         ` Jeff King
2023-04-27 22:32           ` Kristoffer Haugsbakk
2023-04-17  9:51   ` Junio C Hamano
2023-04-18  4:12     ` Jeff King [this message]
2023-04-18 14:02       ` Derrick Stolee
2023-04-21 14:51         ` Thomas Bock
2023-04-22 13:41           ` [PATCH 0/3] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-22 13:42             ` [PATCH 1/3] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-22 13:47             ` [PATCH 2/3] parse_commit(): parse timestamp from end of line Jeff King
2023-04-24 17:05               ` Junio C Hamano
2023-04-25  5:23                 ` Jeff King
2023-04-24 16:39             ` [PATCH 0/3] fixing some parse_commit() timestamp corner cases Junio C Hamano
2023-04-25  5:52             ` [PATCH v2 " Jeff King
2023-04-25  5:54               ` Jeff King
2023-04-25  5:54               ` [PATCH v2 1/4] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-25  5:54               ` [PATCH v2 2/4] parse_commit(): parse timestamp from end of line Jeff King
2023-04-25  5:54               ` [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-25 10:11                 ` Phillip Wood
2023-04-25 16:06                   ` Junio C Hamano
2023-04-26 11:36                     ` Jeff King
2023-04-26 15:32                       ` Junio C Hamano
2023-04-27  8:13                         ` [PATCH v3 0/4] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-27  8:14                           ` [PATCH v3 1/4] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-27  8:14                           ` [PATCH v3 2/4] parse_commit(): parse timestamp from end of line Jeff King
2023-04-27  8:17                           ` [PATCH v3 3/4] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-27 10:11                             ` Phillip Wood
2023-04-27 11:55                               ` Phillip Wood
2023-04-27 16:46                                 ` Jeff King
2023-04-27 16:20                               ` Junio C Hamano
2023-04-27 16:55                                 ` Jeff King
2023-04-27 16:25                             ` Junio C Hamano
2023-04-27 16:57                               ` Jeff King
2023-04-27  8:17                           ` [PATCH v3 4/4] parse_commit(): describe more date-parsing failure modes Jeff King
2023-04-27  8:18                           ` [PATCH v3 0/4] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-27 16:32                           ` Junio C Hamano
2023-04-26 14:06                     ` [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp Phillip Wood
2023-04-26 14:31                       ` Andreas Schwab
2023-04-26 14:44                         ` Phillip Wood
2023-04-25  5:55               ` [PATCH v2 4/4] parse_commit(): describe more date-parsing failure modes Jeff King
2023-04-22 13:52         ` Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980 Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230418041253.GD60552@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=bockthom@cs.uni-saarland.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).