git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Making split commit graphs pick up new options (namely --changed-paths)
@ 2021-06-10 10:40 Ævar Arnfjörð Bjarmason
  2021-06-10 17:22 ` Taylor Blau
  0 siblings, 1 reply; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-10 10:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, peff, szeder.dev


On Wed, Sep 16 2020, Taylor Blau wrote:

Replying to
http://lore.kernel.org/git/ccb6482feb8d8606d82b5ab97e33184f26d6c5b6.1600279373.git.me@ttaylorr.com
as a start-off point for discussion;

> Introduce a command-line flag to specify the maximum number of new Bloom
> filters that a 'git commit-graph write' is willing to compute from
> scratch.
>
> Prior to this patch, a commit-graph write with '--changed-paths' would
> compute Bloom filters for all selected commits which haven't already
> been computed (i.e., by a previous commit-graph write with '--split'
> such that a roll-up or replacement is performed).
>
> This behavior can cause prohibitively-long commit-graph writes for a
> variety of reasons:
>
>   * There may be lots of filters whose diffs take a long time to
>     generate (for example, they have close to the maximum number of
>     changes, diffing itself takes a long time, etc).
>
>   * Old-style commit-graphs (which encode filters with too many entries
>     as not having been computed at all) cause us to waste time
>     recomputing filters that appear to have not been computed only to
>     discover that they are too-large.
>
> This can make the upper-bound of the time it takes for 'git commit-graph
> write --changed-paths' to be rather unpredictable.
>
> To make this command behave more predictably, introduce
> '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
> from scratch. This lets "computing" already-known filters proceed
> quickly, while bounding the number of slow tasks that Git is willing to
> do.
> [...]
> @@ -67,6 +67,11 @@ this option is given, future commit-graph writes will automatically assume
>  that this option was intended. Use `--no-changed-paths` to stop storing this
>  data.
>  +
> +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
> +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
> +enforced. Commits whose filters are not calculated are stored as a
> +length zero Bloom filter.
> ++
> [...]

Is there any way with an existing --split setup that introduces a
--changed-paths to make the "add bloom filters to the graph" eventually
consistent, or is some one-off --split=replace the only way to
grandfather in such a feature?

Reading the code there seems to be no way to do that, and we have the
"chunk_bloom_data" in the graph, as well as "bloom_filter_settings".

I'd expect some way to combine the "max_new_filters" and --split with
some eventual-consistency logic so that graphs not matching our current
settings are replaced, or replaced some <limit> at a time.

Also, am I reading the expire_commit_graphs() logic correctly that we
first write the split graph, and then unlink() things that are too old?
I.e. if you rely on the commit-graph to optimize things this will make
things slower until the next run of writing the graph?

I expected to find something more gentle there, i.e. marking that file
as obsolete, not making it part of the new chain (replacing it), and
then unlinking only things not part of the current chain of data that
are too old. But perhaps I'm just misreading or misunderstanding the
behavior...

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-16  1:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 10:40 Making split commit graphs pick up new options (namely --changed-paths) Ævar Arnfjörð Bjarmason
2021-06-10 17:22 ` Taylor Blau
2021-06-10 18:21   ` Derrick Stolee
2021-06-10 23:56   ` Ævar Arnfjörð Bjarmason
2021-06-11  0:50     ` Taylor Blau
2021-06-11 17:47       ` Derrick Stolee
2021-06-11 19:01         ` Taylor Blau
2021-06-15 14:21           ` Derrick Stolee
2021-06-15 14:35             ` Ævar Arnfjörð Bjarmason
2021-06-16  1:45               ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).