From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 0/7] merge-ort: implement support for packing objects together
Date: Tue, 17 Oct 2023 12:31:08 -0400 [thread overview]
Message-ID: <cover.1697560266.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>
(Previously based on 'eb/limit-bulk-checkin-to-blobs', which has since
been merged. This series is now based on the tip of 'master', which is
a9ecda2788 (The eighteenth batch, 2023-10-13) at the time of writing).
This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.
Much is unchanged since last time, except for a small tweak to one of
the commit messages in response to feedback from Eric W. Biederman. The
series has also been rebased onto 'master', which had a couple of
conflicts that I resolved pertaining to:
- 9eb5419799 (bulk-checkin: only support blobs in index_bulk_checkin,
2023-09-26)
- e0b8c84240 (treewide: fix various bugs w/ OpenSSL 3+ EVP API,
2023-09-01)
They were mostly trivial resolutions, and the results can be viewed in
the range-diff included below.
(From last time: the motivating use-case behind these changes is to
better support repositories who invoke merge-tree frequently, generating
a potentially large number of loose objects, resulting in a possible
adverse effect on performance.)
Thanks in advance for any review!
Taylor Blau (7):
bulk-checkin: factor out `format_object_header_hash()`
bulk-checkin: factor out `prepare_checkpoint()`
bulk-checkin: factor out `truncate_checkpoint()`
bulk-checkin: factor our `finalize_checkpoint()`
bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
builtin/merge-tree.c: implement support for `--write-pack`
Documentation/git-merge-tree.txt | 4 +
builtin/merge-tree.c | 5 +
bulk-checkin.c | 258 ++++++++++++++++++++++++++-----
bulk-checkin.h | 8 +
merge-ort.c | 42 +++--
merge-recursive.h | 1 +
t/t4301-merge-tree-write-tree.sh | 93 +++++++++++
7 files changed, 363 insertions(+), 48 deletions(-)
Range-diff against v1:
1: 37f4072815 ! 1: edf1cbafc1 bulk-checkin: factor out `format_object_header_hash()`
@@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
}
+static void format_object_header_hash(const struct git_hash_algo *algop,
-+ git_hash_ctx *ctx, enum object_type type,
++ git_hash_ctx *ctx,
++ struct hashfile_checkpoint *checkpoint,
++ enum object_type type,
+ size_t size)
+{
+ unsigned char header[16384];
@@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
+
+ algop->init_fn(ctx);
+ algop->update_fn(ctx, header, header_len);
++ algop->init_fn(&checkpoint->ctx);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
- OBJ_BLOB, size);
- the_hash_algo->init_fn(&ctx);
- the_hash_algo->update_fn(&ctx, obuf, header_len);
-+ format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
+- the_hash_algo->init_fn(&checkpoint.ctx);
++ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
++ size);
/* Note: idx is non-NULL when we are writing */
if ((flags & HASH_WRITE_OBJECT) != 0)
2: 9cc1f3014a ! 2: b3f89d5853 bulk-checkin: factor out `prepare_checkpoint()`
@@ Commit message
## bulk-checkin.c ##
@@ bulk-checkin.c: static void format_object_header_hash(const struct git_hash_algo *algop,
- algop->update_fn(ctx, header, header_len);
+ algop->init_fn(&checkpoint->ctx);
}
+static void prepare_checkpoint(struct bulk_checkin_packfile *state,
3: f392ed2211 = 3: abe4fb0a59 bulk-checkin: factor out `truncate_checkpoint()`
4: 9c6ca564ad = 4: 0b855a6eb7 bulk-checkin: factor our `finalize_checkpoint()`
5: 30ca7334c7 ! 5: 239bf39bfb bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
+ git_hash_ctx *ctx,
++ struct hashfile_checkpoint *checkpoint,
+ struct object_id *result_oid,
+ const void *buf, size_t size,
+ enum object_type type,
+ const char *path, unsigned flags)
+{
-+ struct hashfile_checkpoint checkpoint = {0};
+ struct pack_idx_entry *idx = NULL;
+ off_t already_hashed_to = 0;
+
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+ CALLOC_ARRAY(idx, 1);
+
+ while (1) {
-+ prepare_checkpoint(state, &checkpoint, idx, flags);
++ prepare_checkpoint(state, checkpoint, idx, flags);
+ if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
+ buf, size, type, path, flags))
+ break;
-+ truncate_checkpoint(state, &checkpoint, idx);
++ truncate_checkpoint(state, checkpoint, idx);
+ }
+
-+ finalize_checkpoint(state, ctx, &checkpoint, idx, result_oid);
++ finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
+
+ return 0;
+}
@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
+ const char *path, unsigned flags)
+{
+ git_hash_ctx ctx;
++ struct hashfile_checkpoint checkpoint = {0};
+
-+ format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
++ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
++ size);
+
-+ return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
-+ buf, size, OBJ_BLOB, path,
-+ flags);
++ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
++ result_oid, buf, size,
++ OBJ_BLOB, path, flags);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
6: cb0f79cabb ! 6: 57613807d8 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
@@ Commit message
Within `deflate_tree_to_pack_incore()`, the changes should be limited
to something like:
+ struct strbuf converted = STRBUF_INIT;
if (the_repository->compat_hash_algo) {
- struct strbuf converted = STRBUF_INIT;
if (convert_object_file(&compat_obj,
the_repository->hash_algo,
the_repository->compat_hash_algo, ...) < 0)
@@ Commit message
format_object_header_hash(the_repository->compat_hash_algo,
OBJ_TREE, size);
-
- strbuf_release(&converted);
}
+ /* compute the converted tree's hash using the compat algorithm */
+ strbuf_release(&converted);
, assuming related changes throughout the rest of the bulk-checkin
machinery necessary to update the hash of the converted object, which
@@ Commit message
## bulk-checkin.c ##
@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
- flags);
+ OBJ_BLOB, path, flags);
}
+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packf
+ const char *path, unsigned flags)
+{
+ git_hash_ctx ctx;
++ struct hashfile_checkpoint checkpoint = {0};
+
-+ format_object_header_hash(the_hash_algo, &ctx, OBJ_TREE, size);
++ format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
++ size);
+
-+ return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
-+ buf, size, OBJ_TREE, path,
-+ flags);
++ return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
++ result_oid, buf, size,
++ OBJ_TREE, path, flags);
+}
+
static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
7: e969210145 ! 7: f21400f56c builtin/merge-tree.c: implement support for `--write-pack`
@@ merge-ort.c
* We have many arrays of size 3. Whenever we have such an array, the
@@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
if ((merge_status < 0) || !result_buf.ptr)
- ret = err(opt, _("Failed to execute internal merge"));
+ ret = error(_("failed to execute internal merge"));
- if (!ret &&
- write_object_file(result_buf.ptr, result_buf.size,
- OBJ_BLOB, &result->oid))
-- ret = err(opt, _("Unable to add %s to database"),
-- path);
+- ret = error(_("unable to add %s to database"), path);
+ if (!ret) {
+ ret = opt->write_pack
+ ? index_blob_bulk_checkin_incore(&result->oid,
@@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
+ result_buf.size,
+ OBJ_BLOB, &result->oid);
+ if (ret)
-+ ret = err(opt, _("Unable to add %s to database"),
-+ path);
++ ret = error(_("unable to add %s to database"),
++ path);
+ }
free(result_buf.ptr);
--
2.42.0.405.gdb2a2f287e
next prev parent reply other threads:[~2023-10-17 16:31 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07 3:07 ` Eric Biederman
2023-10-09 1:31 ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35 ` Junio C Hamano
2023-10-06 23:02 ` Taylor Blau
2023-10-08 7:02 ` Elijah Newren
2023-10-08 16:04 ` Taylor Blau
2023-10-08 17:33 ` Jeff King
2023-10-09 1:37 ` Taylor Blau
2023-10-09 20:21 ` Jeff King
2023-10-09 17:24 ` Junio C Hamano
2023-10-09 10:54 ` Patrick Steinhardt
2023-10-09 16:08 ` Taylor Blau
2023-10-10 6:36 ` Patrick Steinhardt
2023-10-17 16:31 ` Taylor Blau [this message]
2023-10-17 16:31 ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 2:18 ` Junio C Hamano
2023-10-18 16:34 ` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07 ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10 ` Junio C Hamano
2023-10-19 15:19 ` Taylor Blau
2023-10-19 17:55 ` Junio C Hamano
2023-10-18 17:08 ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08 ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18 ` Junio C Hamano
2023-10-19 15:30 ` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32 ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32 ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32 ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32 ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32 ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32 ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32 ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32 ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33 ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33 ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33 ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33 ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33 ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33 ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33 ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27 ` Taylor Blau
2023-10-23 20:22 ` SZEDER Gábor
2023-10-30 20:24 ` Taylor Blau
2024-01-16 22:08 ` [PATCH v5 " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09 ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09 ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09 ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09 ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09 ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09 ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26 ` SZEDER Gábor
2024-01-29 23:58 ` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09 ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09 ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09 ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09 ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09 ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1697560266.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).