From: Taylor Blau <me@ttaylorr.com>
To: Elijah Newren <newren@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, "Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>
Subject: Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
Date: Sun, 8 Oct 2023 12:04:04 -0400 [thread overview]
Message-ID: <ZSLS9G1lHruig48a@nand.local> (raw)
In-Reply-To: <CABPp-BE+mJ4e==fWNqUNi5RVkoui_xeZN+axnM6vBykDqAzHiA@mail.gmail.com>
On Sun, Oct 08, 2023 at 12:02:27AM -0700, Elijah Newren wrote:
> On Fri, Oct 6, 2023 at 4:02 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > On Fri, Oct 06, 2023 at 03:35:25PM -0700, Junio C Hamano wrote:
> > > Taylor Blau <me@ttaylorr.com> writes:
> > >
> > > > When using merge-tree often within a repository[^1], it is possible to
> > > > generate a relatively large number of loose objects, which can result in
> > > > degraded performance, and inode exhaustion in extreme cases.
> > >
> > > Well, be it "git merge-tree" or "git merge", new loose objects tend
> > > to accumulate until "gc" kicks in, so it is not a new problem for
> > > mere mortals, is it?
> >
> > Yeah, I would definitely suspect that this is more of an issue for
> > forges than individual Git users.
>
> It may still be nice to also do this optimization for plain "git
> merge" as well. I had it in my list of ideas somewhere to do a
> "fast-import-like" thing to avoid writing loose objects, as I
> suspected that'd actually be a performance impediment.
I think that would be worth doing, definitely. I do worry a little bit
about locking in low-quality deltas (or lack thereof), but more on that
below...
> Oh, at the contributor summit, Johannes said he only needed pass/fail,
> not the actual commits, which is why I suggested this route. If you
> need to keep the actual commits, then this won't help.
Yep, agreed. Like I said earlier, I think there are some niche scenarios
where we just care about "would this merge cleanly?", but in most other
cases we want to keep around the actual tree.
> I was interested in the same question as Junio, but from a different
> angle. fast-import documentation points out that the packs it creates
> are suboptimal with poorer delta choices. Are the packs created by
> bulk-checkin prone to the same issues? When I was thinking in terms
> of having "git merge" use fast-import for pack creation instead of
> writing loose objects (an idea I never investigated very far), I was
> wondering if I'd need to mark those packs as "less optimal" and do
> something to make sure they were more likely to be repacked.
>
> I believe geometric repacking didn't exist back when I was thinking
> about this, and perhaps geometric repacking automatically handles
> things nicely for us. Does it, or are we risking retaining
> sub-optimal deltas from the bulk-checkin code?
>
> (I've never really cracked open the pack code, so I have absolutely no
> idea; I'm just curious.)
Yes, the bulk-checkin mechanism suffers from an even worse problem which
is the pack it creates will contain no deltas whatsoever. The contents
of the pack are just getting written as-is, so there's no fancy
delta-ficiation going on.
I think Michael Haggerty (?) suggested to me off-list that it might be
interesting to have a flag that we could mark packs with bad/no deltas
as such so that we don't implicitly trust their contents as having high
quality deltas.
> > I think that like anything, this is a trade-off. Having lots of packs
> > can be a performance hindrance just like having lots of loose objects.
> > But since we can represent more objects with fewer inodes when packed,
> > storing those objects together in a pack is preferable when (a) you're
> > doing lots of test-merges, and (b) you want to keep those objects
> > around, e.g., because they are reachable.
>
> A couple of the comments earlier in the series suggested this was
> about streaming blobs to a pack in the bulk checkin code. Are tree
> and commit objects also put in the pack, or will those continue to be
> written loosely?
This covers both blobs and trees, since IIUC that's all we'd need to
implement support for merge-tree to be able to write any objects it
creates into a pack. AFAIK merge-tree never generates any commit
objects. But teaching 'merge' to perform the same bulk-checkin trick
would just require us implementing index_bulk_commit_checkin_in_core()
or similar, which is straightforward to do on top of the existing code.
Thanks,
Taylor
next prev parent reply other threads:[~2023-10-08 16:04 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07 3:07 ` Eric Biederman
2023-10-09 1:31 ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35 ` Junio C Hamano
2023-10-06 23:02 ` Taylor Blau
2023-10-08 7:02 ` Elijah Newren
2023-10-08 16:04 ` Taylor Blau [this message]
2023-10-08 17:33 ` Jeff King
2023-10-09 1:37 ` Taylor Blau
2023-10-09 20:21 ` Jeff King
2023-10-09 17:24 ` Junio C Hamano
2023-10-09 10:54 ` Patrick Steinhardt
2023-10-09 16:08 ` Taylor Blau
2023-10-10 6:36 ` Patrick Steinhardt
2023-10-17 16:31 ` [PATCH v2 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-17 16:31 ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 2:18 ` Junio C Hamano
2023-10-18 16:34 ` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07 ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10 ` Junio C Hamano
2023-10-19 15:19 ` Taylor Blau
2023-10-19 17:55 ` Junio C Hamano
2023-10-18 17:08 ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08 ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18 ` Junio C Hamano
2023-10-19 15:30 ` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32 ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32 ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32 ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32 ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32 ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32 ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32 ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32 ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33 ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33 ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33 ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33 ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33 ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33 ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33 ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27 ` Taylor Blau
2023-10-23 20:22 ` SZEDER Gábor
2023-10-30 20:24 ` Taylor Blau
2024-01-16 22:08 ` [PATCH v5 " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09 ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09 ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09 ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09 ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09 ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09 ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26 ` SZEDER Gábor
2024-01-29 23:58 ` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09 ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09 ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09 ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09 ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09 ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZSLS9G1lHruig48a@nand.local \
--to=me@ttaylorr.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).