git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Stefan Beller <sbeller@google.com>
Subject: Re: [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit'
Date: Sun, 27 Jan 2019 13:40:33 -0500	[thread overview]
Message-ID: <ae9229c7-926c-fb55-4d1d-658c8b6acfc4@gmail.com> (raw)
In-Reply-To: <20190127132854.GI6702@szeder.dev>

On 1/27/2019 8:28 AM, SZEDER Gábor wrote:
> On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote:
>> When the commit graph and generation numbers were introduced in
>> commits 177722b344 (commit: integrate commit graph with commit
>> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
>> struct commit, 2018-04-25), they tried to make sure that the
>> corresponding 'graph_pos' and 'generation' fields of 'struct commit'
>> are initialized conservatively, as if the commit were not included in
>> the commit-graph file.
>>
>> Alas, initializing those fields only in alloc_commit_node() missed the
>> case when an object that happens to be a commit is first looked up via
>> lookup_unknown_object(), and is then later converted to a 'struct
>> commit' via the object_as_type() helper function (either calling it
>> directly, or as part of a subsequent lookup_commit() call).
>> Consequently, both of those fields incorrectly remain set to zero,
>> which means e.g. that the commit is present in and is the first entry
>> of the commit-graph file.  This will result in wrong timestamp, parent
>> and root tree hashes, if such a 'struct commit' instance is later
>> filled from the commit-graph.
>>
>> Extract the initialization of 'struct commit's fields from
>> alloc_commit_node() into a helper function, and call it from
>> object_as_type() as well, to make sure that it properly initializes
>> the two commit-graph-related fields, too.  With this helper function
>> it is hopefully less likely that any new fields added to 'struct
>> commit' in the future would remain uninitialized.
>>
>> With this change alloc_commit_index() won't have any remaining callers
>> outside of 'alloc.c', so mark it as static.
>>
>> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>> ---
>>
>> So, it turns out that ec0c5798ee (revision: use commit graph in
>> get_reference(), 2018-12-04) is not the culprit after all, it merely
>> highlighted a bug that is as old as the commit-graph feature itself.
>> This patch fixes this and all other related issues I reported
>> upthread.
> 
> And how/why does this affect 'git describe --dirty'?
> 
>   - 'git describe' first iterates over all refs, and somewhere deep
>     inside for_each_ref() each commit (well, object) a ref points to
>     is looked up via lookup_unknown_object().  This leaves all fields
>     of the created object zero initialized.
> 
>   - Then it dereferences HEAD for '--dirty' and ec0c5798ee's changes
>     to get_reference() kick in: lookup_commit() doesn't instantiate a
>     brand new and freshly initialized 'struct commit', but returns the
>     object created in the previous step converted into 'struct
>     commit'.  This conversion doesn't set the commit-graph fields in
>     'struct commit', but leaves both as zero.  get_reference() then
>     tries to load HEAD's commit information from the commit-graph,
>     find_commit_in_graph() sees the the still zero 'graph_pos' field
>     and doesn't perform a search through the commit-graph file, and
>     the subsequent fill_commit_in_graph() reads the commit info from
>     the first entry.
> 
>     In case of the failing test I posted earlier, where only the first
>     commit is in the commit-graph but HEAD isn't, this means that the
>     HEAD's 'struct commit' is filled with the info of HEAD^.
> 
>   - Ultimately, the diff machinery then doesn't compare the worktree
>     to HEAD's tree, but to HEAD^'s, finds that they differ, hence the
>     incorrect '-dirty' flag in the output.
> 
> Before ec0c5798ee get_reference() simply called parse_object(), which
> ignored the commit-graph, so the issue could remain hidden.

Thanks for digging in, Szeder. This is a very subtle interaction, and
I'm glad you caught the issue. There are likely other ways this could
become problematic, including hitting BUG() statements regarding
generation numbers.

I recommend this be merged to 'maint' if possible.

Thanks,
-Stolee


  reply	other threads:[~2019-01-27 18:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04 22:42 [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference() Jonathan Tan
2018-12-04 23:12 ` Stefan Beller
2018-12-06 23:36   ` Jonathan Tan
2018-12-07 13:49     ` Derrick Stolee
2018-12-05  4:54 ` Jeff King
2018-12-06 23:54   ` Jonathan Tan
2018-12-07  8:53     ` Jeff King
2018-12-05 23:15 ` Junio C Hamano
2018-12-07 21:50 ` [PATCH on master v2] " Jonathan Tan
2018-12-09  0:51   ` Junio C Hamano
2018-12-09  1:49     ` Junio C Hamano
2018-12-11 10:54     ` Jeff King
2018-12-12 19:58       ` Jonathan Tan
2018-12-13  1:27         ` Jeff King
2018-12-13 16:20           ` Derrick Stolee
2018-12-13 18:54 ` [PATCH v3] " Jonathan Tan
2018-12-14  3:20   ` Junio C Hamano
2018-12-14  8:45   ` Jeff King
2019-01-25 15:33 ` Regression in: [PATCH on sb/more-repo-in-api] " SZEDER Gábor
2019-01-25 19:56   ` Stefan Beller
2019-01-25 22:01     ` Jonathan Tan
2019-01-25 22:14     ` SZEDER Gábor
2019-01-25 22:21       ` SZEDER Gábor
2019-01-27 13:08         ` [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit' SZEDER Gábor
2019-01-27 13:28           ` SZEDER Gábor
2019-01-27 18:40             ` Derrick Stolee [this message]
2019-01-28 16:15           ` Jeff King
2019-01-28 16:57           ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae9229c7-926c-fb55-4d1d-658c8b6acfc4@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).