git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Taylor Blau <me@ttaylorr.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v2] merge-tree: fix segmentation fault in read-only repositories
Date: Wed, 21 Sep 2022 15:56:37 -0700	[thread overview]
Message-ID: <CABPp-BGUiDXxv8eQhKQXHcem3ke9e=Q99a_FDExZ5XZYUgir6A@mail.gmail.com> (raw)
In-Reply-To: <pull.1362.v2.git.1663798083240.gitgitgadget@gmail.com>

On Wed, Sep 21, 2022 at 3:08 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Independent of the question whether we want `git merge-tree` to report
> the tree name even when it failed to write the tree objects in a
> read-only repository, there is no question that we should avoid a
> segmentation fault.

Ah, a read-only repo.  Looks like we didn't bother to check the return
status of write_object_file() in one of the two cases; oops.  (And it
looks like the other case does not correctly propagate the error
upwards far enough...)

> And when we report an invalid tree name (because the tree could not be
> written), we need the exit status to be non-zero.
>
> Let's make it so.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     merge-tree: fix segmentation fault in read-only repositories
>
>     Turns out that the segmentation fault reported by Taylor
>     [https://lore.kernel.org/git/YyopQD+LvPucnz3w@nand.local/] happened
>     while testing merge-ort in a read-only repository, and that the upstream
>     version of git merge-tree is as affected as GitHub's internal version.
>
>     Note: I briefly considered using the OID of the_hash_algo->empty_tree
>     instead of null_oid() when no tree object could be constructed. However,
>     I have come to the conclusion that this could potentially cause its own
>     set of problems because it would relate to a valid tree object even if
>     we do not have any valid tree object to play with.
>
>     Also note: The question I hinted at in the commit message, namely
>     whether or not to try harder to construct a tree object even if we
>     cannot write it out, maybe merits a longer discussion, one that I think
>     we should have after v2.38.0 is released, so as not to distract from
>     focusing on v2.38.0.

That's fair, but you know I'm going to have a hard time refraining
from commenting on it anyway...  :-)

[...]
> @@ -446,7 +447,8 @@ static int real_merge(struct merge_tree_options *o,
>         if (o->show_messages == -1)
>                 o->show_messages = !result.clean;
>
> -       printf("%s%c", oid_to_hex(&result.tree->object.oid), line_termination);
> +       tree_oid = result.tree ? &result.tree->object.oid : null_oid();
> +       printf("%s%c", oid_to_hex(tree_oid), line_termination);

Perhaps also print a warning to stderr when result.tree is NULL?

>         if (!result.clean) {
>                 struct string_list conflicted_files = STRING_LIST_INIT_NODUP;
>                 const char *last = NULL;
> @@ -473,7 +475,7 @@ static int real_merge(struct merge_tree_options *o,
>                                               &result);
>         }
>         merge_finalize(&opt, &result);
> -       return !result.clean; /* result.clean < 0 handled above */
> +       return !result.tree || !result.clean; /* result.clean < 0 handled above */

Thinking out loud, should this logic be at the merge-ort.c level,
perhaps something like this:

@@ -4940,6 +4941,9 @@ static void
merge_ort_nonrecursive_internal(struct merge_options *opt,
        result->tree = parse_tree_indirect(&working_tree_oid);
        /* existence of conflicted entries implies unclean */
        result->clean &= strmap_empty(&opt->priv->conflicted);
+       if (!result->tree)
+               /* This shouldn't happen, because if we did fail to
write a tree we should have returned early before getting here.  But
just in case we have some bugs... */
+               result->clean = -1;
        if (!opt->priv->call_depth) {
                result->priv = opt->priv;
                result->_properly_initialized = RESULT_INITIALIZED;

That might benefit callers other than merge-tree, though maybe it
makes it harder to print a helpful error message (unless we're fine
with the library always throwing one?)

> diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
> index 28ca5c38bb5..013b77144bd 100755
> --- a/t/t4301-merge-tree-write-tree.sh
> +++ b/t/t4301-merge-tree-write-tree.sh
> @@ -810,4 +810,13 @@ test_expect_success 'can override merge of unrelated histories' '
>         test_cmp expect actual
>  '
>
> +test_expect_success SANITY 'merge-ort fails gracefully in a read-only repository' '
> +       git init --bare read-only &&
> +       git push read-only side1 side2 side3 &&
> +       test_when_finished "chmod -R u+w read-only" &&
> +       chmod -R a-w read-only &&
> +       test_must_fail git -C read-only merge-tree side1 side3 &&
> +       test_must_fail git -C read-only merge-tree side1 side2
> +'
> +
>  test_done
>
> base-commit: dda7228a83e2e9ff584bf6adbf55910565b41e14
> --
> gitgitgadget

This is a reasonable workaround from the merge-tree side, if we expect
merge-ort to not correctly notify us of issues (which may be fair
given that a quick glance suggests we have more than one problematic
codepath, which I'll discuss more below).  However, I do think
merge-ort should be returning a negative value for the "clean" status
in such a case.  If merge-ort did that, then we wouldn't even get to
your new code because merge-tree.c already checks for that:

    merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
    if (result.clean < 0)
        die(_("failure to merge"));

I have a hack above which sets a negative return value, but it's only
a workaround since it comes from a late detect-after-the-fact
location.  Here's another ugly hack, but one which highlights exactly
where the real problem occurs:

diff --git a/merge-ort.c b/merge-ort.c
index 99dcee2db8..c2144e9220 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3605,7 +3605,8 @@ static void write_tree(struct object_id *result_oid,
        }

        /* Write this object file out, and record in result_oid */
-       write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid);
+       if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid))
+               die(_("Unable to add new tree to database"));
        strbuf_release(&buf);
 }

die()ing, of course, is not a good choice, we should really return -1
instead.  Unfortunately, the callers aren't currently prepared to
propagate such a value upwards (as reflected in the return type of the
function) so any fix in this area is a little more involved than just
modifying these few lines.  Also, there is one other call to
write_object_file() in merge-ort.c which does check and return a value
of -1, though looking around it appears the callers of that other site
do not always correctly check and propagate a return value of -1
further upwards as they should, meaning we are losing track of the
fact that the merge failed to function.

Given the lack of correct propagation of -1 return values in the other
codepath plus the error here, keeping some form of workaround sounds
fair (though it may be nice to print an error that something fishy
happened).

And then, possibly post-v2.38.0 though we may be able to get it in
before, getting correct propagation of a -1 return value from the
source of the error would be good.  Would you like to look into that,
or would you prefer I did?

  parent reply	other threads:[~2022-09-21 22:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-21 15:30 [PATCH] merge-tree: fix segmentation fault in read-only repositories Johannes Schindelin via GitGitGadget
2022-09-21 15:42 ` Taylor Blau
2022-09-21 20:12   ` Johannes Schindelin
2022-09-21 18:12 ` Junio C Hamano
2022-09-21 20:13   ` Johannes Schindelin
2022-09-21 22:08 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
2022-09-21 22:24   ` Junio C Hamano
2022-09-22 19:52     ` Johannes Schindelin
2022-09-21 22:56   ` Elijah Newren [this message]
2022-09-21 23:11     ` Junio C Hamano
2022-09-22 17:24     ` Johannes Schindelin
2022-09-22 19:50     ` Johannes Schindelin
2022-09-23  1:47       ` Elijah Newren
2022-09-22 19:46   ` [PATCH v3] merge-ort: " Johannes Schindelin via GitGitGadget
2022-09-23  1:38     ` Elijah Newren
2022-09-23  9:55       ` Johannes Schindelin
2022-09-26 21:55     ` [PATCH v4 0/2] merge-tree: " Johannes Schindelin via GitGitGadget
2022-09-26 21:55       ` [PATCH v4 1/2] merge-ort: " Johannes Schindelin via GitGitGadget
2022-09-26 21:55       ` [PATCH v4 2/2] merge-ort: return early when failing to write a blob Johannes Schindelin via GitGitGadget
2022-09-26 22:07         ` Junio C Hamano
2022-09-27  8:05         ` Elijah Newren
2022-09-27 16:45           ` Junio C Hamano
2022-09-27  8:11       ` [PATCH v4 0/2] merge-tree: fix segmentation fault in read-only repositories Elijah Newren
2022-09-28  7:29       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-09-28  7:29         ` [PATCH v5 1/2] merge-ort: " Johannes Schindelin via GitGitGadget
2022-09-28  7:29         ` [PATCH v5 2/2] merge-ort: return early when failing to write a blob Johannes Schindelin via GitGitGadget
2022-09-28 15:53           ` Junio C Hamano
2022-09-29  1:36             ` Elijah Newren
2022-09-29  1:49               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BGUiDXxv8eQhKQXHcem3ke9e=Q99a_FDExZ5XZYUgir6A@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).