git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
Date: Tue, 16 Jan 2024 15:49:24 -0500	[thread overview]
Message-ID: <Zabr1Glljjgl/UUB@nand.local> (raw)
In-Reply-To: <20240113225157.GD3000857@szeder.dev>

On Sat, Jan 13, 2024 at 11:51:57PM +0100, SZEDER Gábor wrote:
> On a related note, if current git (I tried current master and v2.43.0)
> encounters a commit graph layer containing v2 Bloom filters (created
> by current seen) while writing a new commit graph, then it segfaults
> dereferencing a NULL 'settings' pointer in
> get_or_compute_bloom_filter().
>
> The test below demonstrates this, but it's quite hacky using two
> different git versions: it has to be run by an old git version not yet
> supporting v2 Bloom filters, and a new git version already supporting
> them should be installed at /tmp/git-new/.
>
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 2ba0324a69..0464dd68d5 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -454,4 +454,33 @@ test_expect_success 'Bloom reader notices out-of-order index offsets' '
>  	test_cmp expect.err err
>  '
>
> +CENT=$(printf "\302\242")
> +test_expect_success 'split commit graph vs changed paths Bloom filter v2 vs old git' '
> +	git init split-v2-old &&
> +	(
> +		cd split-v2-old &&
> +		git commit --allow-empty -m "Bloom filters are written but still ignored for root commits :(" &&
> +		for i in 1 2 3
> +		do
> +			echo $i >$CENT &&
> +			git add $CENT &&
> +			git commit -m "$i" || return 1
> +		done &&
> +		git log --oneline -- $CENT >expect &&
> +
> +		# Here we write a commit graph layer containing v2 changed
> +		# path Bloom filters using a git binary built from current
> +		# 'seen' branch.
> +		git rev-parse HEAD^ |
> +		/tmp/git-new/bin/git -c commitgraph.changedPathsVersion=2 \
> +			commit-graph write --stdin-commits --changed-paths --split &&
> +
> +		# This is current master, and segfaults.
> +		git commit-graph write --reachable --changed-paths &&
> +
> +		git log --oneline -- $CENT >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_done

Thanks. The segfault is reproducible on my end, but I don't think that
it is possible to fix this for existing versions of Git. The problem (as
you note in your backtrace) is here:

    #0  0x000055555569c842 in get_or_compute_bloom_filter (
        r=0x5555559c9ce0 <the_repo>, c=0x5555559dffd0, compute_if_not_present=1,
        settings=0x0, computed=0x7fffffffe0f4) at bloom.c:253

Which tries to dereference ctx->bloom_settings, which is NULL. Note that
we initialize some sensible defaults for ctx->bloom_settings in
commit-graph.c::write_commit_graph():

    struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
    /* ... */
    bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
                                                  bloom_settings.bits_per_entry);
    bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
                                              bloom_settings.num_hashes);
    bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS",
                                                     bloom_settings.max_changed_paths);
    ctx->bloom_settings = &bloom_settings;

...but we'll throw those away in favor of whatever is in the topmost
layer of the existing commit-graph chain later on in that same function:

    if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
      struct commit_graph *g;

      g = ctx->r->objects->commit_graph;

      /* We have changed-paths already. Keep them in the next graph */
      if (g && g->chunk_bloom_data) {
        ctx->changed_paths = 1;
        ctx->bloom_settings = g->bloom_filter_settings;
      }
    }

OK, everything seems fine thus far, until we inspect the value of
g->bloom_filter_settings, which is NULL, becuase of this hunk from
commit-graph.c::graph_read_bloom_data():

    if (hash_version != 1)
      return 0;

which terminates the function before we assign g->bloom_filter_settings
for the existing (written with v2 Bloom filters) graph layer.

I don't think that there is a way to fix this in a backwards compatible
way, but I'm comfortable with that in this instance since we don't
expect users to upgrading to v2 Bloom filters and then writing new graph
layers using a non-v2 compatible version of Git.

We can add a warning in the series that I'm working on indicating this,
but I don't think there's much more we can do besides changing this to
indicate a warning and bailing instead of segfaulting.

Thanks,
Taylor


  reply	other threads:[~2024-01-16 22:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03  1:02 What's cooking in git.git (Jan 2024, #01; Tue, 2) Junio C Hamano
2024-01-03  5:53 ` ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2)) Patrick Steinhardt
2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
2024-01-03 16:37   ` Junio C Hamano
2024-01-05  8:59     ` Jeff King
2024-01-05 16:34       ` Junio C Hamano
2024-01-03 17:14   ` René Scharfe
2024-01-03 16:43 ` Taylor Blau
2024-01-03 18:08   ` Junio C Hamano
2024-01-13 18:35     ` SZEDER Gábor
2024-01-13 22:06       ` Taylor Blau
2024-01-13 23:41         ` SZEDER Gábor
2024-01-16 20:37           ` Taylor Blau
2024-02-25 22:59             ` SZEDER Gábor
2024-02-26 14:44               ` Taylor Blau
2024-01-13 22:51       ` SZEDER Gábor
2024-01-16 20:49         ` Taylor Blau [this message]
2024-01-16 22:45           ` Junio C Hamano
2024-01-16 23:31             ` Taylor Blau
2024-01-16 23:42               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zabr1Glljjgl/UUB@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).