From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: phillip.wood@dunelm.org.uk
Cc: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>,
git@vger.kernel.org, "Taylor Blau" <me@ttaylorr.com>,
"Phillip Wood" <phillip.wood123@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] range-diff: support reading mbox files
Date: Tue, 22 Nov 2022 08:40:07 +0100 (CET) [thread overview]
Message-ID: <rr7sp534-43o9-7n1o-5700-369n5rprq75p@tzk.qr> (raw)
In-Reply-To: <8d40f170-650f-800a-02d7-d279186d5883@dunelm.org.uk>
[-- Attachment #1: Type: text/plain, Size: 7334 bytes --]
Hi Phillip,
On Mon, 21 Nov 2022, Phillip Wood wrote:
> [...]
> > @@ range-diff.c: struct patch_util {
> > +
> > + line = contents.buf;
> > + size = contents.len;
> > -+ for (; size > 0; size -= len, line += len) {
> > ++ for (; size; size -= len, line += len) {
> > + const char *p;
> > +
> > -+ len = find_eol(line, size);
> > ++ len = find_next_line(line, size);
> > +
> > -+ if (state == MBOX_BEFORE_HEADER) {
> > ++ if (state == MBOX_BEFORE_HEADER ||
> > ++ (state == MBOX_IN_DIFF && line[0] == 'F')) {
>
> If we see an 'F' when parsing a diff then it is probably a From: line
> indicating that we've reached the start of a new patch. I think we need to add
> the diff we've just parsed to the list of commits before continuing and change
> state to MBOX_BEFORE_HEADER otherwise we'll hit the continue just below this
> comment with state unchanged.
As always, thank you *so* much for modeling how to perform productive,
actionable and *thorough* code reviews that focus on the relevant.
You spotted something important. It is a bug, as you suspected, and there
is even more: `util` leaks because it is allocated both after the patch
was parsed and when we see a diff header, without releasing the first (but
granted, this leak has been there before).
You have no idea how much I appreciate your diligence; I understand how
much of an effort it takes to study patches in depth until you understand
them enough to spot complex issues like this one. (In Parkinson's
parlance: I am grateful that you focus on the design of the nuclear power
plant, see https://en.wikipedia.org/w/index.php?title=Atwood%27s_duck)
Back to our regularly scheduled programming (pun intended): I indeed made
a mistake here and fixed it, by removing that 'F' check again, and instead
adding a `goto` label and jumping to that upon encountering the end of the
diff.
I also de-duplicated the `util = xcalloc(1, sizeof(*util))` calls so that
we do that _only_ in the `MBOX_BEFORE_HEADER` arm, and no longer at the
end of the `MBOX_IN_DIFF` arm.
Since I already looked at a leak, I looked further and saw that also
`copy` is leaked in `show_range_diff()` in case of an mbox. I've addressed
that, too.
>
> > + if (!skip_prefix(line, "From ", &p))
> > + continue;
> > +
> > @@ range-diff.c: struct patch_util {
> > + author = subject = NULL;
> > +
> > + state = MBOX_IN_HEADER;
> > ++ continue;
> > + }
> > +
> > + if (starts_with(line, "diff --git ")) {
> > @@ range-diff.c: struct patch_util {
> > + strbuf_addch(&buf, '\n');
> > + if (!util->diff_offset)
> > + util->diff_offset = buf.len;
> > -+ line[len - 1] = '\n';
> > ++
> > + orig_len = len;
> > -+ len = parse_git_diff_header(&root, &linenr, 1,
> > line,
> > -+ len, size,
> > &patch);
> > ++ /* `find_next_line()`'s replaced the LF with a
> > NUL */
> > ++ line[len - 1] = '\n';
> > ++ len = len > 1 && line[len - 2] == '\r' ?
> > ++ error(_("cannot handle diff headers
> > with "
> > ++ "CR/LF line endings")) :
> > ++ parse_git_diff_header(&root, &linenr,
> > 1, line,
> > ++ len, size,
> > &patch);
> > + if (len < 0) {
> > + error(_("could not parse git header
> > '%.*s'"),
> > + orig_len, line);
> > ++ release_patch(&patch);
>
> Thank you for using release_patch() rather than the "free random struct
> members" approach that was suggested.
Indeed, it sounded like a specific suggestion to release a specific
`struct` member, but that was of course not the end of the story, and I
completed the analysis myself that I had wished the reviewer would have
completed _before_ sending off a review.
> > @@ t/t3206-range-diff.sh: test_expect_success 'ranges with pathspecs'
> > '
> > +test_expect_success 'compare range vs mbox' '
> > + git format-patch --stdout topic..mode-only-change >mbox &&
> > + git range-diff topic...mode-only-change >expect &&
> > ++
> > + git range-diff mode-only-change..topic mbox:./mbox >actual &&
> > + test_cmp expect actual &&
> > -+ git range-diff mode-only-change..topic mbox:- <mbox >actual &&
> > -+ test_cmp expect actual
> > ++
>
> I'm a bit confused by this sed command, I've annotated it with my probably
> flawed understanding.
>
> > ++ sed -e "/^From: .*/{
> > ++ h
>
> This stores the From: header in the hold space
👍
> > ++ s/.*/From: Bugs Bunny <bugs@bun.ny>/
>
> This changes the From: header in the pattern space
👍
> > ++ :1
> > ++ N
> > ++ /[ -z]$/b1
>
> We loop until we find a line that does not end with a space, letter or number
> adding the lines to the hold space
I would have _loved_ to match on an empty line, i.e. `/[^\n]$/b1`. But
that construct is not understood by the `sed` on macOS.
I even went so far as to search for the source code of a BSD `sed` (and I
found it, and modified the code so that it builds on Linux, see
https://github.com/dscho/sed-bsd) to try a couple of things, but could not
make it work with any variation of `\n`. Therefore, I settled on expecting
all the lines in the commit header to end in printable ASCII characters
between the space and the `z`.
> > ++ G
>
> This appends the hold space to the pattern space, then the pattern space is
> printed.
👍
> Doesn't this mean we end up with two From: headers? Is the in-body From:
> line already present?
There is no in-body `From:` header because the patch author matches the
`GIT_AUTHOR_IDENT` that is in effect while running the `format-patch`
command.
Let me show you what this `sed` call deals with. In the local test run, it
modified an `mbox` starting with this:
From 4d39cb329d3ef4c8e69b43859c2e11adb83f8613 Mon Sep 17 00:00:00 2001
From: Thomas Rast <trast@inf.ethz.ch>
Date: Mon, 22 Jul 2013 11:23:44 +0200
Subject: [PATCH 1/3] s/4/A/ + add other-file
---
file | 2 +-
other-file | 0
[...]
to a modified `mbox` that starts with this:
From 4d39cb329d3ef4c8e69b43859c2e11adb83f8613 Mon Sep 17 00:00:00 2001
From: Bugs Bunny <bugs@bun.ny>
Date: Mon, 22 Jul 2013 11:23:44 +0200
Subject: [PATCH 1/3] s/4/A/ + add other-file
From: Thomas Rast <trast@inf.ethz.ch>
---
file | 2 +-
other-file | 0
[...]
>
> > ++ }" <mbox >mbox.from &&
> > ++ git range-diff mode-only-change..topic mbox:./mbox.from
> > >actual.from &&
> > ++ test_cmp expect actual.from &&
> > ++
> > ++ append_cr <mbox >mbox.cr &&
> > ++ test_must_fail git range-diff \
> > ++ mode-only-change..topic mbox:./mbox.cr 2>err &&
> > ++ grep CR/LF err &&
>
> Thanks for adding that
Thank you for the suggestion to add it!
And thank you again for modeling how to perform actionable, helpful and
productive code reviews,
Dscho
next prev parent reply other threads:[~2022-11-22 7:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-15 18:20 [PATCH] range-diff: support reading mbox files Johannes Schindelin via GitGitGadget
2022-11-16 2:09 ` Taylor Blau
2022-11-16 14:40 ` Phillip Wood
2022-11-17 10:26 ` Phillip Wood
2022-11-18 12:53 ` Johannes Schindelin
2022-11-18 13:16 ` Johannes Schindelin
2022-11-17 18:24 ` Ævar Arnfjörð Bjarmason
2022-11-18 11:39 ` Johannes Schindelin
2022-11-19 23:11 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
2022-11-21 10:08 ` Phillip Wood
2022-11-22 7:40 ` Johannes Schindelin [this message]
2022-11-22 14:20 ` Phillip Wood
2022-11-22 9:08 ` [PATCH v3] " Johannes Schindelin via GitGitGadget
2022-11-22 14:23 ` Phillip Wood
2022-11-22 23:58 ` Junio C Hamano
2023-03-03 22:02 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=rr7sp534-43o9-7n1o-5700-369n5rprq75p@tzk.qr \
--to=johannes.schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
--cc=phillip.wood123@gmail.com \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).