git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org, l.s.r@web.de,
	"Derrick Stolee" <dstolee@microsoft.com>
Subject: Re: [PATCH v4 04/10] commit-graph: persist existence of changed-paths
Date: Fri, 16 Oct 2020 09:52:55 -0400	[thread overview]
Message-ID: <6fe5de28-01f3-fc48-6bc3-aec0777ad889@gmail.com> (raw)
In-Reply-To: <20201016031844.GA2181783@nand.local>

On 10/15/2020 11:18 PM, Taylor Blau wrote:
> On Thu, Oct 15, 2020 at 10:18:47PM -0400, Derrick Stolee wrote:
>> On 10/15/2020 5:41 PM, Taylor Blau wrote:
>>> So, we need to be more aggressively checking the Bloom filter settings
>>> in any layer we want to reuse a Bloom filter out of before reusing it
>>> verbatim in the current layer. The patch below the scissors line is
>>> sufficient to do that, and it causes the third test to start passing.
>>
>> I think there are three things we should keep in mind:
>>
>> 1. Incompatible Bloom filter settings between layers should be seen
>>    as _inconsistent data_ as Git should not be writing incremental
>>    commit-graph files with inconsistent Bloom filter settings. Thus,
>>    when reading the commit-graph chain we should prevent incompatible
>>    filters from being used. One way to do this is to notice different
>>    settings and completely disable Bloom filters. The other way would
>>    be to take the settings from the first layer with filters and then
>>    clear the chunk_bloom_indexes and chunk_bloom_data fields for the
>>    layers that don't agree. This fits with an expectation that lower
>>    layers are larger, so more filters can be used in that situation.
> 
> Sure; I'd be fine with only allowing filters computed with the settings
> present in the lowest or largest layer in the event that multiple layers
> exist with incompatible settings.
> 
> I'm trying to point us towards a direction of not optimizing too far
> along a direction that we're unlikely to take, while also trying to do
> something relatively non-invasive to make it possible for a version of
> Git to change the default Bloom settings. That is, if a user is writing
> split commit-graphs, and we change the default Bloom settings, they
> shouldn't have to recompute or merge down all of their Bloom filters.

They would need to recompute when they merge layers, which introduces
a big question about how we should handle such a case.

> If that's something that we never think is going to happen, I'm fine
> with not thinking too hard about it. But, I also don't want to paint
> ourselves into a corner, so I think something like the patch I wrote in
> the email that you're replying to actually may be worth pursuing
> further. I dunno. Definitely after 2.29, though.

I think the proposed "react properly to this unlikely situation"
is a good way to prevent getting locked into our choices now. It
makes it possible for "old" Git versions (2.30 until we decide to
allow this mix) to interact with the inconsistent settings without
failure.

We don't need to do the 100% "optimal" case of using all filters
in order to enable this choice in the future.
 
[...]

> For what it's worth, I was mainly talking about it to say that it would
> be more effort than it's probably worth to do. There's also nothing that
> we're currently discussing that would prevent us from taking that same
> direction up in six months from now.

Yes, I just want to make sure that everyone agrees there is a
middle ground without saying that inconsistent filter settings
across layers is a "fully supported" feature. If someone wants
to tackle the work to make it a desirable state, then they can
try that (with great care).

Thanks,
-Stolee

  reply	other threads:[~2020-10-16 13:53 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 20:14 [PATCH 0/8] More commit-graph/Bloom filter improvements Derrick Stolee via GitGitGadget
2020-06-15 20:14 ` [PATCH 1/8] commit-graph: place bloom_settings in context Derrick Stolee via GitGitGadget
2020-06-18 20:30   ` René Scharfe
2020-06-19 12:58     ` Derrick Stolee
2020-06-15 20:14 ` [PATCH 2/8] commit-graph: unify the signatures of all write_graph_chunk_*() functions SZEDER Gábor via GitGitGadget
2020-06-18 20:30   ` René Scharfe
2020-06-15 20:14 ` [PATCH 3/8] commit-graph: simplify chunk writes into loop SZEDER Gábor via GitGitGadget
2020-06-18 20:30   ` René Scharfe
2020-06-15 20:14 ` [PATCH 4/8] commit-graph: check chunk sizes after writing SZEDER Gábor via GitGitGadget
2020-06-15 20:14 ` [PATCH 5/8] commit-graph: check all leading directories in changed path Bloom filters SZEDER Gábor via GitGitGadget
2020-06-18 20:31   ` René Scharfe
2020-06-19  9:14     ` René Scharfe
2020-06-19 17:17   ` Taylor Blau
2020-06-19 17:19     ` Taylor Blau
2020-06-23 13:47     ` Derrick Stolee
2020-06-15 20:14 ` [PATCH 6/8] bloom: enforce a minimum size of 8 bytes Derrick Stolee via GitGitGadget
2020-06-15 20:14 ` [PATCH 7/8] commit-graph: change test to die on parse, not load Derrick Stolee via GitGitGadget
2020-06-15 20:14 ` [PATCH 8/8] commit-graph: persist existence of changed-paths Derrick Stolee via GitGitGadget
2020-06-17 21:21 ` [PATCH 0/8] More commit-graph/Bloom filter improvements Junio C Hamano
2020-06-18  1:46   ` Derrick Stolee
2020-06-23 17:46 ` [PATCH v2 00/11] " Derrick Stolee via GitGitGadget
2020-06-23 17:47   ` [PATCH v2 01/11] commit-graph: place bloom_settings in context Derrick Stolee via GitGitGadget
2020-06-23 17:47   ` [PATCH v2 02/11] commit-graph: change test to die on parse, not load Derrick Stolee via GitGitGadget
2020-06-23 17:47   ` [PATCH v2 03/11] bloom: get_bloom_filter() cleanups Derrick Stolee via GitGitGadget
2020-06-25  7:24     ` René Scharfe
2020-06-23 17:47   ` [PATCH v2 04/11] commit-graph: persist existence of changed-paths Derrick Stolee via GitGitGadget
2020-06-23 17:47   ` [PATCH v2 05/11] commit-graph: unify the signatures of all write_graph_chunk_*() functions SZEDER Gábor via GitGitGadget
2020-06-25  7:25     ` René Scharfe
2020-06-23 17:47   ` [PATCH v2 06/11] commit-graph: simplify chunk writes into loop SZEDER Gábor via GitGitGadget
2020-06-25  7:25     ` René Scharfe
2020-06-25 14:59       ` Derrick Stolee
2020-06-23 17:47   ` [PATCH v2 07/11] commit-graph: check chunk sizes after writing SZEDER Gábor via GitGitGadget
2020-06-25  7:25     ` René Scharfe
2020-06-25 15:02       ` Derrick Stolee
2020-06-23 17:47   ` [PATCH v2 08/11] revision.c: fix whitespace Derrick Stolee via GitGitGadget
2020-06-23 17:47   ` [PATCH v2 09/11] revision: empty pathspecs should not use Bloom filters Taylor Blau via GitGitGadget
2020-06-23 17:47   ` [PATCH v2 10/11] commit-graph: check all leading directories in changed path " SZEDER Gábor via GitGitGadget
2020-06-25  7:25     ` René Scharfe
2020-06-25 15:05       ` Derrick Stolee
2020-06-26  6:34         ` SZEDER Gábor
2020-06-26 14:42           ` Derrick Stolee
2020-06-23 17:47   ` [PATCH v2 11/11] bloom: enforce a minimum size of 8 bytes Derrick Stolee via GitGitGadget
2020-06-24 23:11   ` [PATCH v2 00/11] More commit-graph/Bloom filter improvements Junio C Hamano
2020-06-24 23:32     ` Derrick Stolee
2020-06-25  0:38       ` Junio C Hamano
2020-06-25 13:38         ` Derrick Stolee
2020-06-25 16:34           ` Junio C Hamano
2020-06-26 12:30   ` [PATCH v3 00/10] " Derrick Stolee via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 01/10] commit-graph: place bloom_settings in context Derrick Stolee via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 02/10] commit-graph: change test to die on parse, not load Derrick Stolee via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 03/10] bloom: fix logic in get_bloom_filter() Derrick Stolee via GitGitGadget
2020-06-27 16:33       ` SZEDER Gábor
2020-06-29 13:02         ` Derrick Stolee
2020-06-26 12:30     ` [PATCH v3 04/10] commit-graph: persist existence of changed-paths Derrick Stolee via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 05/10] commit-graph: unify the signatures of all write_graph_chunk_*() functions SZEDER Gábor via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 06/10] commit-graph: simplify chunk writes into loop SZEDER Gábor via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 07/10] commit-graph: check chunk sizes after writing SZEDER Gábor via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 08/10] revision.c: fix whitespace Derrick Stolee via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 09/10] revision: empty pathspecs should not use Bloom filters Taylor Blau via GitGitGadget
2020-06-26 12:30     ` [PATCH v3 10/10] commit-graph: check all leading directories in changed path " SZEDER Gábor via GitGitGadget
2020-07-01 13:27     ` [PATCH v4 00/10] More commit-graph/Bloom filter improvements Derrick Stolee via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 01/10] commit-graph: place bloom_settings in context Derrick Stolee via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 02/10] commit-graph: change test to die on parse, not load Derrick Stolee via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 03/10] bloom: fix logic in get_bloom_filter() Derrick Stolee via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 04/10] commit-graph: persist existence of changed-paths Derrick Stolee via GitGitGadget
2020-10-15 13:21         ` SZEDER Gábor
2020-10-15 21:41           ` Taylor Blau
2020-10-16  2:18             ` Derrick Stolee
2020-10-16  3:18               ` Taylor Blau
2020-10-16 13:52                 ` Derrick Stolee [this message]
2020-07-01 13:27       ` [PATCH v4 05/10] commit-graph: unify the signatures of all write_graph_chunk_*() functions SZEDER Gábor via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 06/10] commit-graph: simplify chunk writes into loop SZEDER Gábor via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 07/10] commit-graph: check chunk sizes after writing SZEDER Gábor via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 08/10] revision.c: fix whitespace Derrick Stolee via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 09/10] revision: empty pathspecs should not use Bloom filters Taylor Blau via GitGitGadget
2020-07-01 13:27       ` [PATCH v4 10/10] commit-graph: check all leading directories in changed path " SZEDER Gábor via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fe5de28-01f3-fc48-6bc3-aec0777ad889@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).