git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com,
	peff@peff.net, szeder.dev@gmail.com
Subject: Re: Making split commit graphs pick up new options (namely --changed-paths)
Date: Thu, 10 Jun 2021 20:50:02 -0400	[thread overview]
Message-ID: <YMKzOgOU9lj9Nt0z@nand.local> (raw)
In-Reply-To: <87mtrx1cdr.fsf@evledraar.gmail.com>

On Fri, Jun 11, 2021 at 01:56:31AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> Reading the code there seems to be no way to do that, and we have the
> >> "chunk_bloom_data" in the graph, as well as "bloom_filter_settings".
> >>
> >> I'd expect some way to combine the "max_new_filters" and --split with
> >> some eventual-consistency logic so that graphs not matching our current
> >> settings are replaced, or replaced some <limit> at a time.
> >
> > This is asking about something slightly different, Bloom filter
> > settings rather than the existence of chagned-path Bloom filters
> > themselves. The Bloom settings aren't written to the commit-graph
> > although there has been some discussion about doing this in the past.
> >
> > If we ever did encode the Bloom settings, I imagine that accomplishing a
> > sort of "eventually replace all changed-path Bloom filters with these
> > new settings" would be as simple as considering all filters computed
> > under different settings to be "uncomputed".
>
> Yes, I don't have or know of a use-case for this now.
>
> But perhaps as the commit-graph format gets used more as a generic
> container for generated data about commits a thing like this would be
> useful, i.e. regenerating that part of the graph if the settings are
> different, particularly (as has been said) if it's not easy to discover
> the input setting from the data itself.

I wasn't completely correct about this (as Stolee noted and you ACK
below): we do store *some* settings in the format, but we don't store
the maximum number of entries in each filter. So today we would indicate
"computed, too large" for a commit that touches, say, 513 paths (the
limit is 512). But if you later ran:

  GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=514 \
    git commit-graph write --split=replace --changed-paths

we would propagate the "too large" status forward and not bother to
recompute the filter (even though we could represent it if we did bother
to recompute it).

In the future we should probably include this as part of the filter
settings and recompute all filters in the commit-graph we're writing if
the value differs from the previous commit-graph. We have to recompute
it even if the maximum decreases since we can't ask the filter directly
"how many entries do you have?".

...and we have to assume that the limit is 512 on all commit-graphs that
exist in the wild today which do not explicit include this default
value. (That's not totally correct, of course, because you can set the
GIT_TEST_ variable, but it's close enough that I'd feel comfortable with
that rule in practice).

> So yeah, maybe we can just unlink() them right away, or another way to
> handle the race is that load_commit_graph_chain() could just try again
> from the beginning in such a case, and presumably picking up the fresh
> just-rewritten chain.

I'd probably be in favor of the latter.

Thanks,
Taylor

  reply	other threads:[~2021-06-11  0:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 10:40 Making split commit graphs pick up new options (namely --changed-paths) Ævar Arnfjörð Bjarmason
2021-06-10 17:22 ` Taylor Blau
2021-06-10 18:21   ` Derrick Stolee
2021-06-10 23:56   ` Ævar Arnfjörð Bjarmason
2021-06-11  0:50     ` Taylor Blau [this message]
2021-06-11 17:47       ` Derrick Stolee
2021-06-11 19:01         ` Taylor Blau
2021-06-15 14:21           ` Derrick Stolee
2021-06-15 14:35             ` Ævar Arnfjörð Bjarmason
2021-06-16  1:45               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMKzOgOU9lj9Nt0z@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).