git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: git <git@vger.kernel.org>, David Turner <novalis@novalis.org>
Subject: Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
Date: Thu, 24 Jan 2019 10:55:10 -0800	[thread overview]
Message-ID: <CAGZ79kbHLvN252v-gNbcpsyGg8pZ9GPBtyZquX50HwhtYep5oA@mail.gmail.com> (raw)
In-Reply-To: <20190124123240.GB11354@sigill.intra.peff.net>

On Thu, Jan 24, 2019 at 4:32 AM Jeff King <peff@peff.net> wrote:
>
> There's an odd bug when "log --color-moved" is used with the combination
> of "--cc --stat -p": the stat for merge commits is erroneously shown
> with the diff of the _next_ commit.
>
> The included test demonstrates the issue. Our history looks something
> like this:
>
>   A-B-M--D
>    \ /
>     C
>
> When we run "git log --cc --stat -p --color-moved" starting at D, we get
> this sequence of events:
>
>   1. The diff for D is using -p, so diff_flush() calls into
>      diff_flush_patch_all_file_pairs(). There we see that o->color_moved
>      is in effect, so we point o->emitted_symbols to a static local
>      struct, causing diff_flush_patch() to queue the symbols instead of
>      actually writing them out.
>
>      We then do our move detection, emit the symbols, and clear the
>      struct. But we leave o->emitted_symbols pointing to our struct.
>
>   2. Next we compute the diff for M. This is a merge, so we use the
>      combined diff code. In find_paths_generic(), we compute the
>      pairwise diff between each commit and its parent. Normally this is
>      done with DIFF_FORMAT_NO_OUTPUT, since we're just looking for
>      intersecting paths. But since "--stat --cc" shows the first-parent
>      stat, and since we're computing that diff anyway, we enable
>      DIFF_FORMAT_DIFFSTAT for the first parent. This outputs the stat
>      information immediately, saving us from running a separate
>      first-parent diff later.
>
>      But where does that output go? Normally it goes directly to stdout,
>      but because o->emitted_symbols is set, we queue it. As a result, we
>      don't actually print the diffstat for the merge commit (yet),

Thanks for your analysis. As always a pleasant read.
I understand and agree with what is written up to here remembering
the code vaguely.

> which
>      is wrong.

I disagree with this sentiment. If we remember to flush the queued output
this is merely an inefficiency due to implementation details, but not wrong.

We could argue that it is wrong to have o->emitted_symbols set, as
we know we don't need it for producing a diffstat only.

>
>   3. Next we compute the diff for C. We're actually showing a patch
>      again, so we end up in diff_flush_patch_all_file_pairs(), but this
>      time we have the queued stat from step 2 waiting in our struct.

Right, that is how the queueing can produce errors. I wonder if the
test that is included in this patch would work on top of
e6e045f803 ("diff.c: buffer all output if asked to", 2017-06-29)
as that commit specifically wanted to make sure these errors
would be caught.

>
>      We add new elements to it for C's diff, and then flush the whole
>      thing. And we see the diffstat from M as part of C's diff, which is
>      wrong.
>
> So triggering the bug really does require the combination of all of
> those options.

Similarly we can trigger bugs by using options that enable buffering
(so far only --color-moved) and options that do not fully buffer and flush.

>
> To fix it, we can simply restore o->emitted_symbols to NULL after
> flushing it, so that it does not affect anything outside of
> diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
> nobody outside of that function is going to bother flushing it, so we
> would not want them to write to it either.

This would also cause the inefficiency I mentioned after (2) to disappear,
as the merge commits diffstat would be just printed to stdout?

>
> In fact, we could take this a step further and turn the local "esm"
> struct into a non-static variable that goes away after the function
> ends. However, since it contains a dynamically sized array, we benefit
> from amortizing the cost of allocations over many calls. So we'll leave
> it as static to retain that benefit.

okay.

>
> But let's push the zero-ing of esm.nr into the conditional for "if
> (o->emitted_symbols)" to make it clear that we do not expect esm to hold
> any values if we did not just try to use it. With the code as it is
> written now, if we did encounter such a case (which I think would be a
> bug), we'd silently leak those values without even bothering to display
> them. With this change, we'd at least eventually show them, and somebody
> would notice.

Wow. Good call.

>
> Signed-off-by: Jeff King <peff@peff.net>

Reviewed-by: Stefan Beller <sbeller@google.com>

  reply	other threads:[~2019-01-24 18:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
2019-01-24 12:27 ` [PATCH 1/6] t4006: resurrect commented-out tests Jeff King
2019-01-24 18:18   ` Stefan Beller
2019-01-24 12:32 ` [PATCH 2/6] diff: clear emitted_symbols flag after use Jeff King
2019-01-24 18:55   ` Stefan Beller [this message]
2019-01-24 19:11     ` Jeff King
2019-01-24 20:18   ` Junio C Hamano
2019-01-24 20:36     ` Stefan Beller
2019-01-24 21:17       ` Jeff King
2019-01-24 21:15     ` Jeff King
2019-01-24 12:33 ` [PATCH 3/6] combine-diff: factor out stat-format mask Jeff King
2019-01-24 12:34 ` [PATCH 4/6] combine-diff: treat --shortstat like --stat Jeff King
2019-01-24 18:58   ` David Turner
2019-01-24 19:02   ` Stefan Beller
2019-01-24 12:35 ` [PATCH 5/6] combine-diff: treat --summary " Jeff King
2019-01-24 19:14   ` Stefan Beller
2019-01-24 19:23     ` Jeff King
2019-01-24 12:36 ` [PATCH 6/6] combine-diff: treat --dirstat " Jeff King
2019-01-24 19:21 ` [PATCH 0/6] some diff --cc --stat fixes Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kbHLvN252v-gNbcpsyGg8pZ9GPBtyZquX50HwhtYep5oA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=novalis@novalis.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).