git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
	"Eric W. Biederman" <ebiederm@gmail.com>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 0/7] merge-ort: implement support for packing objects together
Date: Tue, 17 Oct 2023 12:31:08 -0400	[thread overview]
Message-ID: <cover.1697560266.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

(Previously based on 'eb/limit-bulk-checkin-to-blobs', which has since
been merged. This series is now based on the tip of 'master', which is
a9ecda2788 (The eighteenth batch, 2023-10-13) at the time of writing).

This series implements support for a new merge-tree option,
`--write-pack`, which causes any newly-written objects to be packed
together instead of being stored individually as loose.

Much is unchanged since last time, except for a small tweak to one of
the commit messages in response to feedback from Eric W. Biederman. The
series has also been rebased onto 'master', which had a couple of
conflicts that I resolved pertaining to:

  - 9eb5419799 (bulk-checkin: only support blobs in index_bulk_checkin,
    2023-09-26)
  - e0b8c84240 (treewide: fix various bugs w/ OpenSSL 3+ EVP API,
    2023-09-01)

They were mostly trivial resolutions, and the results can be viewed in
the range-diff included below.

(From last time: the motivating use-case behind these changes is to
better support repositories who invoke merge-tree frequently, generating
a potentially large number of loose objects, resulting in a possible
adverse effect on performance.)

Thanks in advance for any review!

Taylor Blau (7):
  bulk-checkin: factor out `format_object_header_hash()`
  bulk-checkin: factor out `prepare_checkpoint()`
  bulk-checkin: factor out `truncate_checkpoint()`
  bulk-checkin: factor our `finalize_checkpoint()`
  bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
  bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
  builtin/merge-tree.c: implement support for `--write-pack`

 Documentation/git-merge-tree.txt |   4 +
 builtin/merge-tree.c             |   5 +
 bulk-checkin.c                   | 258 ++++++++++++++++++++++++++-----
 bulk-checkin.h                   |   8 +
 merge-ort.c                      |  42 +++--
 merge-recursive.h                |   1 +
 t/t4301-merge-tree-write-tree.sh |  93 +++++++++++
 7 files changed, 363 insertions(+), 48 deletions(-)

Range-diff against v1:
1:  37f4072815 ! 1:  edf1cbafc1 bulk-checkin: factor out `format_object_header_hash()`
    @@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
      }
      
     +static void format_object_header_hash(const struct git_hash_algo *algop,
    -+				      git_hash_ctx *ctx, enum object_type type,
    ++				      git_hash_ctx *ctx,
    ++				      struct hashfile_checkpoint *checkpoint,
    ++				      enum object_type type,
     +				      size_t size)
     +{
     +	unsigned char header[16384];
    @@ bulk-checkin.c: static void prepare_to_stream(struct bulk_checkin_packfile *stat
     +
     +	algop->init_fn(ctx);
     +	algop->update_fn(ctx, header, header_len);
    ++	algop->init_fn(&checkpoint->ctx);
     +}
     +
      static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
    @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
     -					  OBJ_BLOB, size);
     -	the_hash_algo->init_fn(&ctx);
     -	the_hash_algo->update_fn(&ctx, obuf, header_len);
    -+	format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
    +-	the_hash_algo->init_fn(&checkpoint.ctx);
    ++	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
    ++				  size);
      
      	/* Note: idx is non-NULL when we are writing */
      	if ((flags & HASH_WRITE_OBJECT) != 0)
2:  9cc1f3014a ! 2:  b3f89d5853 bulk-checkin: factor out `prepare_checkpoint()`
    @@ Commit message
     
      ## bulk-checkin.c ##
     @@ bulk-checkin.c: static void format_object_header_hash(const struct git_hash_algo *algop,
    - 	algop->update_fn(ctx, header, header_len);
    + 	algop->init_fn(&checkpoint->ctx);
      }
      
     +static void prepare_checkpoint(struct bulk_checkin_packfile *state,
3:  f392ed2211 = 3:  abe4fb0a59 bulk-checkin: factor out `truncate_checkpoint()`
4:  9c6ca564ad = 4:  0b855a6eb7 bulk-checkin: factor our `finalize_checkpoint()`
5:  30ca7334c7 ! 5:  239bf39bfb bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
    @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
      
     +static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
     +					       git_hash_ctx *ctx,
    ++					       struct hashfile_checkpoint *checkpoint,
     +					       struct object_id *result_oid,
     +					       const void *buf, size_t size,
     +					       enum object_type type,
     +					       const char *path, unsigned flags)
     +{
    -+	struct hashfile_checkpoint checkpoint = {0};
     +	struct pack_idx_entry *idx = NULL;
     +	off_t already_hashed_to = 0;
     +
    @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
     +		CALLOC_ARRAY(idx, 1);
     +
     +	while (1) {
    -+		prepare_checkpoint(state, &checkpoint, idx, flags);
    ++		prepare_checkpoint(state, checkpoint, idx, flags);
     +		if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
     +					       buf, size, type, path, flags))
     +			break;
    -+		truncate_checkpoint(state, &checkpoint, idx);
    ++		truncate_checkpoint(state, checkpoint, idx);
     +	}
     +
    -+	finalize_checkpoint(state, ctx, &checkpoint, idx, result_oid);
    ++	finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
     +
     +	return 0;
     +}
    @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
     +				       const char *path, unsigned flags)
     +{
     +	git_hash_ctx ctx;
    ++	struct hashfile_checkpoint checkpoint = {0};
     +
    -+	format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
    ++	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
    ++				  size);
     +
    -+	return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
    -+						   buf, size, OBJ_BLOB, path,
    -+						   flags);
    ++	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
    ++						   result_oid, buf, size,
    ++						   OBJ_BLOB, path, flags);
     +}
     +
      static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
6:  cb0f79cabb ! 6:  57613807d8 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
    @@ Commit message
         Within `deflate_tree_to_pack_incore()`, the changes should be limited
         to something like:
     
    +        struct strbuf converted = STRBUF_INIT;
             if (the_repository->compat_hash_algo) {
    -          struct strbuf converted = STRBUF_INIT;
               if (convert_object_file(&compat_obj,
                                       the_repository->hash_algo,
                                       the_repository->compat_hash_algo, ...) < 0)
    @@ Commit message
     
               format_object_header_hash(the_repository->compat_hash_algo,
                                         OBJ_TREE, size);
    -
    -          strbuf_release(&converted);
             }
    +        /* compute the converted tree's hash using the compat algorithm */
    +        strbuf_release(&converted);
     
         , assuming related changes throughout the rest of the bulk-checkin
         machinery necessary to update the hash of the converted object, which
    @@ Commit message
     
      ## bulk-checkin.c ##
     @@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
    - 						   flags);
    + 						   OBJ_BLOB, path, flags);
      }
      
     +static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
    @@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packf
     +				       const char *path, unsigned flags)
     +{
     +	git_hash_ctx ctx;
    ++	struct hashfile_checkpoint checkpoint = {0};
     +
    -+	format_object_header_hash(the_hash_algo, &ctx, OBJ_TREE, size);
    ++	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
    ++				  size);
     +
    -+	return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
    -+						   buf, size, OBJ_TREE, path,
    -+						   flags);
    ++	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
    ++						   result_oid, buf, size,
    ++						   OBJ_TREE, path, flags);
     +}
     +
      static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
7:  e969210145 ! 7:  f21400f56c builtin/merge-tree.c: implement support for `--write-pack`
    @@ merge-ort.c
       * We have many arrays of size 3.  Whenever we have such an array, the
     @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
      		if ((merge_status < 0) || !result_buf.ptr)
    - 			ret = err(opt, _("Failed to execute internal merge"));
    + 			ret = error(_("failed to execute internal merge"));
      
     -		if (!ret &&
     -		    write_object_file(result_buf.ptr, result_buf.size,
     -				      OBJ_BLOB, &result->oid))
    --			ret = err(opt, _("Unable to add %s to database"),
    --				  path);
    +-			ret = error(_("unable to add %s to database"), path);
     +		if (!ret) {
     +			ret = opt->write_pack
     +				? index_blob_bulk_checkin_incore(&result->oid,
    @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
     +						    result_buf.size,
     +						    OBJ_BLOB, &result->oid);
     +			if (ret)
    -+				ret = err(opt, _("Unable to add %s to database"),
    -+					  path);
    ++				ret = error(_("unable to add %s to database"),
    ++					    path);
     +		}
      
      		free(result_buf.ptr);
-- 
2.42.0.405.gdb2a2f287e


  parent reply	other threads:[~2023-10-17 16:31 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07  3:07   ` Eric Biederman
2023-10-09  1:31     ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35   ` Junio C Hamano
2023-10-06 23:02     ` Taylor Blau
2023-10-08  7:02       ` Elijah Newren
2023-10-08 16:04         ` Taylor Blau
2023-10-08 17:33           ` Jeff King
2023-10-09  1:37             ` Taylor Blau
2023-10-09 20:21               ` Jeff King
2023-10-09 17:24             ` Junio C Hamano
2023-10-09 10:54       ` Patrick Steinhardt
2023-10-09 16:08         ` Taylor Blau
2023-10-10  6:36           ` Patrick Steinhardt
2023-10-17 16:31 ` Taylor Blau [this message]
2023-10-17 16:31   ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18  2:18     ` Junio C Hamano
2023-10-18 16:34       ` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07   ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10     ` Junio C Hamano
2023-10-19 15:19       ` Taylor Blau
2023-10-19 17:55         ` Junio C Hamano
2023-10-18 17:08   ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08   ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18     ` Junio C Hamano
2023-10-19 15:30       ` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32   ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32   ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32   ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32   ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32   ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32   ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32   ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32   ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32   ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32   ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33   ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33   ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33   ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33   ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33   ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33   ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33   ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26   ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27     ` Taylor Blau
2023-10-23 20:22       ` SZEDER Gábor
2023-10-30 20:24         ` Taylor Blau
2024-01-16 22:08   ` [PATCH v5 " Taylor Blau
2024-01-16 22:09     ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09     ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09     ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09     ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09     ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09     ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09     ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26       ` SZEDER Gábor
2024-01-29 23:58         ` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09     ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09     ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09     ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09     ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09     ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09     ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1697560266.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=ebiederm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).