From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
"Eric W. Biederman" <ebiederm@gmail.com>,
Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
Date: Tue, 17 Oct 2023 19:18:04 -0700 [thread overview]
Message-ID: <xmqq5y34wu5f.fsf@gitster.g> (raw)
In-Reply-To: <239bf39bfb21ef621a15839bade34446dcbc3103.1697560266.git.me@ttaylorr.com> (Taylor Blau's message of "Tue, 17 Oct 2023 12:31:26 -0400")
Taylor Blau <me@ttaylorr.com> writes:
> bulk-checkin.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++
> bulk-checkin.h | 4 ++
> 2 files changed, 122 insertions(+)
Unlike the previous four, which were very clear refactoring to
create reusable helper functions, this step leaves a bad aftertaste
after reading twice, and I think what is disturbing is that many new
lines are pretty much literally copied from stream_blob_to_pack().
I wonder if we can introduce an "input" source abstraction, that
replaces "fd" and "size" (and "path" for error reporting) parameters
to the stream_blob_to_pack(), so that the bulk of the implementation
of stream_blob_to_pack() can call its .read() method to read bytes
up to "size" from such an abstracted interface? That would be a
good sized first half of this change. Then in the second half, you
can add another "input" source that works with in-core "buf" and
"size", whose .read() method will merely be a memcpy().
That way, we will have two functions, one for stream_obj_to_pack()
that reads from an open file descriptor, and the other for
stream_obj_to_pack_incore() that reads from an in-core buffer,
sharing the bulk of the implementation that is extracted from the
current code, which hopefully be easier to audit.
> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index f4914fb6d1..25cd1ffa25 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -140,6 +140,69 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
> return 0;
> }
>
> +static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state,
> + git_hash_ctx *ctx,
> + off_t *already_hashed_to,
> + const void *buf, size_t size,
> + enum object_type type,
> + const char *path, unsigned flags)
> +{
> + git_zstream s;
> + unsigned char obuf[16384];
> + unsigned hdrlen;
> + int status = Z_OK;
> + int write_object = (flags & HASH_WRITE_OBJECT);
> +
> + git_deflate_init(&s, pack_compression_level);
> +
> + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size);
> + s.next_out = obuf + hdrlen;
> + s.avail_out = sizeof(obuf) - hdrlen;
> +
> + if (*already_hashed_to < size) {
> + size_t hsize = size - *already_hashed_to;
> + if (hsize) {
> + the_hash_algo->update_fn(ctx, buf, hsize);
> + }
> + *already_hashed_to = size;
> + }
> + s.next_in = (void *)buf;
> + s.avail_in = size;
> +
> + while (status != Z_STREAM_END) {
> + status = git_deflate(&s, Z_FINISH);
> + if (!s.avail_out || status == Z_STREAM_END) {
> + if (write_object) {
> + size_t written = s.next_out - obuf;
> +
> + /* would we bust the size limit? */
> + if (state->nr_written &&
> + pack_size_limit_cfg &&
> + pack_size_limit_cfg < state->offset + written) {
> + git_deflate_abort(&s);
> + return -1;
> + }
> +
> + hashwrite(state->f, obuf, written);
> + state->offset += written;
> + }
> + s.next_out = obuf;
> + s.avail_out = sizeof(obuf);
> + }
> +
> + switch (status) {
> + case Z_OK:
> + case Z_BUF_ERROR:
> + case Z_STREAM_END:
> + continue;
> + default:
> + die("unexpected deflate failure: %d", status);
> + }
> + }
> + git_deflate_end(&s);
> + return 0;
> +}
> +
> /*
> * Read the contents from fd for size bytes, streaming it to the
> * packfile in state while updating the hash in ctx. Signal a failure
> @@ -316,6 +379,50 @@ static void finalize_checkpoint(struct bulk_checkin_packfile *state,
> }
> }
>
> +static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
> + git_hash_ctx *ctx,
> + struct hashfile_checkpoint *checkpoint,
> + struct object_id *result_oid,
> + const void *buf, size_t size,
> + enum object_type type,
> + const char *path, unsigned flags)
> +{
> + struct pack_idx_entry *idx = NULL;
> + off_t already_hashed_to = 0;
> +
> + /* Note: idx is non-NULL when we are writing */
> + if (flags & HASH_WRITE_OBJECT)
> + CALLOC_ARRAY(idx, 1);
> +
> + while (1) {
> + prepare_checkpoint(state, checkpoint, idx, flags);
> + if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
> + buf, size, type, path, flags))
> + break;
> + truncate_checkpoint(state, checkpoint, idx);
> + }
> +
> + finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
> +
> + return 0;
> +}
> +
> +static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
> + struct object_id *result_oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags)
> +{
> + git_hash_ctx ctx;
> + struct hashfile_checkpoint checkpoint = {0};
> +
> + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
> + size);
> +
> + return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
> + result_oid, buf, size,
> + OBJ_BLOB, path, flags);
> +}
> +
> static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
> struct object_id *result_oid,
> int fd, size_t size,
> @@ -396,6 +503,17 @@ int index_blob_bulk_checkin(struct object_id *oid,
> return status;
> }
>
> +int index_blob_bulk_checkin_incore(struct object_id *oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags)
> +{
> + int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid,
> + buf, size, path, flags);
> + if (!odb_transaction_nesting)
> + flush_bulk_checkin_packfile(&bulk_checkin_packfile);
> + return status;
> +}
> +
> void begin_odb_transaction(void)
> {
> odb_transaction_nesting += 1;
> diff --git a/bulk-checkin.h b/bulk-checkin.h
> index aa7286a7b3..1b91daeaee 100644
> --- a/bulk-checkin.h
> +++ b/bulk-checkin.h
> @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid,
> int fd, size_t size,
> const char *path, unsigned flags);
>
> +int index_blob_bulk_checkin_incore(struct object_id *oid,
> + const void *buf, size_t size,
> + const char *path, unsigned flags);
> +
> /*
> * Tell the object database to optimize for adding
> * multiple objects. end_odb_transaction must be called
next prev parent reply other threads:[~2023-10-18 2:18 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07 3:07 ` Eric Biederman
2023-10-09 1:31 ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35 ` Junio C Hamano
2023-10-06 23:02 ` Taylor Blau
2023-10-08 7:02 ` Elijah Newren
2023-10-08 16:04 ` Taylor Blau
2023-10-08 17:33 ` Jeff King
2023-10-09 1:37 ` Taylor Blau
2023-10-09 20:21 ` Jeff King
2023-10-09 17:24 ` Junio C Hamano
2023-10-09 10:54 ` Patrick Steinhardt
2023-10-09 16:08 ` Taylor Blau
2023-10-10 6:36 ` Patrick Steinhardt
2023-10-17 16:31 ` [PATCH v2 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-17 16:31 ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 2:18 ` Junio C Hamano [this message]
2023-10-18 16:34 ` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31 ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07 ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10 ` Junio C Hamano
2023-10-19 15:19 ` Taylor Blau
2023-10-19 17:55 ` Junio C Hamano
2023-10-18 17:08 ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08 ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18 ` Junio C Hamano
2023-10-19 15:30 ` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08 ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32 ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32 ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32 ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32 ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32 ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32 ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32 ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32 ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33 ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33 ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33 ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33 ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33 ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33 ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33 ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27 ` Taylor Blau
2023-10-23 20:22 ` SZEDER Gábor
2023-10-30 20:24 ` Taylor Blau
2024-01-16 22:08 ` [PATCH v5 " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09 ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09 ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09 ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09 ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09 ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09 ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26 ` SZEDER Gábor
2024-01-29 23:58 ` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09 ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09 ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09 ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09 ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09 ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09 ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09 ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq5y34wu5f.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).