From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de,
szeder.dev@gmail.com, Chris Torek <chris.torek@gmail.com>,
Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v3 00/17] Refactor chunk-format into an API
Date: Fri, 05 Feb 2021 14:30:35 +0000 [thread overview]
Message-ID: <pull.848.v3.git.1612535452.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.848.v2.git.1611759716.gitgitgadget@gmail.com>
This is a restart on the topic previously submitted [1] but dropped because
ak/corrected-commit-date was still in progress. This version is based on
that branch.
[1]
https://lore.kernel.org/git/pull.804.git.1607012215.gitgitgadget@gmail.com/
This version also changes the approach to use a more dynamic interaction
with a struct chunkfile pointer. This idea is credited to Taylor Blau [2],
but I started again from scratch. I also go further to make struct chunkfile
anonymous to API consumers. It is defined only in chunk-format.c, which
should hopefully deter future users from interacting with that data
directly.
[2] https://lore.kernel.org/git/X8%2FI%2FRzXZksio+ri@nand.local/
This combined API is beneficial to reduce duplicated logic. Or rather, to
ensure that similar file formats have similar protections against bad data.
The multi-pack-index code did not have as many guards as the commit-graph
code did, but now they both share a common base that checks for things like
duplicate chunks or offsets outside the size of the file.
Here are some stats for the end-to-end change:
* 570 insertions(+), 456 deletions(-).
* commit-graph.c: 107 insertions(+), 192 deletions(-)
* midx.c: 164 insertions(+), 260 deletions(-)
While there is an overall increase to the code size, the consumers do get
smaller. Boilerplate things like abstracting method to match chunk_write_fn
and chunk_read_fn make up a lot of these insertions. The "interesting" code
gets a lot smaller and cleaner.
Updates in V3
=============
* API methods use better types and changed their order to match internal
data more closely.
* Use hashfile_total() instead of internal data values.
* The implementation of pair_chunk() uses read_chunk().
* init_chunkfile() has an in-code doc comment warning against using the
same struct chunkfile for reads and writes.
* More multiplications are correctly cast in midx.c.
* The chunk-format technical docs are expanded.
Updates in V2
=============
* The method pair_chunk() now automatically sets a pointer while
read_chunk() uses the callback. This greatly reduces the code size.
* Pointer casts are now implicit instead of explicit.
* Extra care is taken to not overflow when verifying chunk sizes on write.
Thanks, -Stolee
Derrick Stolee (17):
commit-graph: anonymize data in chunk_write_fn
chunk-format: create chunk format write API
commit-graph: use chunk-format write API
midx: rename pack_info to write_midx_context
midx: use context in write_midx_pack_names()
midx: add entries to write_midx_context
midx: add pack_perm to write_midx_context
midx: add num_large_offsets to write_midx_context
midx: return success/failure in chunk write methods
midx: drop chunk progress during write
midx: use chunk-format API in write_midx_internal()
chunk-format: create read chunk API
commit-graph: use chunk-format read API
midx: use chunk-format read API
midx: use 64-bit multiplication for chunk sizes
chunk-format: restore duplicate chunk checks
chunk-format: add technical docs
Documentation/technical/chunk-format.txt | 116 +++++
.../technical/commit-graph-format.txt | 3 +
Documentation/technical/pack-format.txt | 3 +
Makefile | 1 +
chunk-format.c | 180 ++++++++
chunk-format.h | 65 +++
commit-graph.c | 299 +++++-------
midx.c | 431 +++++++-----------
t/t5318-commit-graph.sh | 2 +-
t/t5319-multi-pack-index.sh | 6 +-
10 files changed, 648 insertions(+), 458 deletions(-)
create mode 100644 Documentation/technical/chunk-format.txt
create mode 100644 chunk-format.c
create mode 100644 chunk-format.h
base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-848%2Fderrickstolee%2Fchunk-format%2Frefactor-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-848/derrickstolee/chunk-format/refactor-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/848
Range-diff vs v2:
1: 243dcec94368 = 1: 243dcec94368 commit-graph: anonymize data in chunk_write_fn
2: 814512f21671 ! 2: 16c37d2370cf chunk-format: create chunk format write API
@@ Commit message
5. free the chunkfile struct using free_chunkfile().
Helped-by: Taylor Blau <me@ttaylorr.com>
+ Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
## Makefile ##
@@ chunk-format.c (new)
+}
+
+void add_chunk(struct chunkfile *cf,
-+ uint64_t id,
-+ chunk_write_fn fn,
-+ size_t size)
++ uint32_t id,
++ size_t size,
++ chunk_write_fn fn)
+{
+ ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc);
+
@@ chunk-format.c (new)
+int write_chunkfile(struct chunkfile *cf, void *data)
+{
+ int i;
-+ size_t cur_offset = cf->f->offset + cf->f->total;
++ uint64_t cur_offset = hashfile_total(cf->f);
+
+ /* Add the table of contents to the current offset */
+ cur_offset += (cf->chunks_nr + 1) * CHUNK_LOOKUP_WIDTH;
@@ chunk-format.c (new)
+ hashwrite_be64(cf->f, cur_offset);
+
+ for (i = 0; i < cf->chunks_nr; i++) {
-+ uint64_t start_offset = cf->f->total + cf->f->offset;
++ off_t start_offset = hashfile_total(cf->f);
+ int result = cf->chunks[i].write_fn(cf->f, data);
+
+ if (result)
+ return result;
+
-+ if (cf->f->total + cf->f->offset - start_offset != cf->chunks[i].size)
++ if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size)
+ BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
+ cf->chunks[i].size, cf->chunks[i].id,
-+ cf->f->total + cf->f->offset - start_offset);
++ hashfile_total(cf->f) - start_offset);
+ }
+
+ return 0;
@@ chunk-format.h (new)
+struct chunkfile *init_chunkfile(struct hashfile *f);
+void free_chunkfile(struct chunkfile *cf);
+int get_num_chunks(struct chunkfile *cf);
-+typedef int (*chunk_write_fn)(struct hashfile *f,
-+ void *data);
++typedef int (*chunk_write_fn)(struct hashfile *f, void *data);
+void add_chunk(struct chunkfile *cf,
-+ uint64_t id,
-+ chunk_write_fn fn,
-+ size_t size);
++ uint32_t id,
++ size_t size,
++ chunk_write_fn fn);
+int write_chunkfile(struct chunkfile *cf, void *data);
+
+#endif
3: 70af6e3083f4 ! 3: e549e24d79af commit-graph: use chunk-format write API
@@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
- chunks[2].write_fn = write_graph_chunk_data;
+ cf = init_chunkfile(f);
+
-+ add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
-+ write_graph_chunk_fanout, GRAPH_FANOUT_SIZE);
-+ add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP,
-+ write_graph_chunk_oids, hashsz * ctx->commits.nr);
-+ add_chunk(cf, GRAPH_CHUNKID_DATA,
-+ write_graph_chunk_data, (hashsz + 16) * ctx->commits.nr);
++ add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, GRAPH_FANOUT_SIZE,
++ write_graph_chunk_fanout);
++ add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, hashsz * ctx->commits.nr,
++ write_graph_chunk_oids);
++ add_chunk(cf, GRAPH_CHUNKID_DATA, (hashsz + 16) * ctx->commits.nr,
++ write_graph_chunk_data);
if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0))
ctx->write_generation_data = 0;
@@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
- }
+ if (ctx->write_generation_data)
+ add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
-+ write_graph_chunk_generation_data,
-+ sizeof(uint32_t) * ctx->commits.nr);
++ sizeof(uint32_t) * ctx->commits.nr,
++ write_graph_chunk_generation_data);
+ if (ctx->num_generation_data_overflows)
+ add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
-+ write_graph_chunk_generation_data_overflow,
-+ sizeof(timestamp_t) * ctx->num_generation_data_overflows);
++ sizeof(timestamp_t) * ctx->num_generation_data_overflows,
++ write_graph_chunk_generation_data_overflow);
+ if (ctx->num_extra_edges)
+ add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES,
-+ write_graph_chunk_extra_edges,
-+ 4 * ctx->num_extra_edges);
++ 4 * ctx->num_extra_edges,
++ write_graph_chunk_extra_edges);
if (ctx->changed_paths) {
- chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES;
- chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
@@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
- chunks[num_chunks].id = 0;
- chunks[num_chunks].size = 0;
+ add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-+ write_graph_chunk_bloom_indexes,
-+ sizeof(uint32_t) * ctx->commits.nr);
++ sizeof(uint32_t) * ctx->commits.nr,
++ write_graph_chunk_bloom_indexes);
+ add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-+ write_graph_chunk_bloom_data,
+ sizeof(uint32_t) * 3
-+ + ctx->total_bloom_filter_data_size);
++ + ctx->total_bloom_filter_data_size,
++ write_graph_chunk_bloom_data);
+ }
+ if (ctx->num_commit_graphs_after > 1)
+ add_chunk(cf, GRAPH_CHUNKID_BASE,
-+ write_graph_chunk_base,
-+ hashsz * (ctx->num_commit_graphs_after - 1));
++ hashsz * (ctx->num_commit_graphs_after - 1),
++ write_graph_chunk_base);
hashwrite_be32(f, GRAPH_SIGNATURE);
4: 0cac7890bed7 = 4: 66ff49ed9309 midx: rename pack_info to write_midx_context
5: 4a4e90b129ae = 5: 1d7484c0cffa midx: use context in write_midx_pack_names()
6: 30ad423997b7 = 6: ea0e7d40e537 midx: add entries to write_midx_context
7: 2f1c496f3ab5 = 7: b283a38fb775 midx: add pack_perm to write_midx_context
8: c4939548e51c = 8: e7064512ab7f midx: add num_large_offsets to write_midx_context
9: b3cc73c22567 ! 9: 7aa3242e15b7 midx: return success/failure in chunk write methods
@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
stop_progress(&progress);
- if (written != chunk_offsets[num_chunks])
-+ if (f->total + f->offset != chunk_offsets[num_chunks])
++ if (hashfile_total(f) != chunk_offsets[num_chunks])
BUG("incorrect final offset %"PRIu64" != %"PRIu64,
- written,
-+ f->total + f->offset,
++ hashfile_total(f),
chunk_offsets[num_chunks]);
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
10: 78744d3b7016 ! 10: 70f68c95e479 midx: drop chunk progress during write
@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
}
- stop_progress(&progress);
- if (f->total + f->offset != chunk_offsets[num_chunks])
+ if (hashfile_total(f) != chunk_offsets[num_chunks])
BUG("incorrect final offset %"PRIu64" != %"PRIu64,
11: 07dc0cf8c683 ! 11: 787cd7f18d2e midx: use chunk-format API in write_midx_internal()
@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
- case MIDX_CHUNKID_PACKNAMES:
- write_midx_pack_names(f, &ctx);
- break;
-+ add_chunk(cf, MIDX_CHUNKID_PACKNAMES,
-+ write_midx_pack_names, pack_name_concat_len);
-+ add_chunk(cf, MIDX_CHUNKID_OIDFANOUT,
-+ write_midx_oid_fanout, MIDX_CHUNK_FANOUT_SIZE);
++ add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
++ write_midx_pack_names);
++ add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
++ write_midx_oid_fanout);
+ add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
-+ write_midx_oid_lookup, ctx.entries_nr * the_hash_algo->rawsz);
++ ctx.entries_nr * the_hash_algo->rawsz,
++ write_midx_oid_lookup);
+ add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
-+ write_midx_object_offsets,
-+ ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH);
++ ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH,
++ write_midx_object_offsets);
- case MIDX_CHUNKID_OIDFANOUT:
- write_midx_oid_fanout(f, &ctx);
@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
- }
+ if (ctx.large_offsets_needed)
+ add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
-+ write_midx_large_offsets,
-+ ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH);
++ ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
++ write_midx_large_offsets);
-- if (f->total + f->offset != chunk_offsets[num_chunks])
+- if (hashfile_total(f) != chunk_offsets[num_chunks])
- BUG("incorrect final offset %"PRIu64" != %"PRIu64,
-- f->total + f->offset,
+- hashfile_total(f),
- chunk_offsets[num_chunks]);
+ write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
+ write_chunkfile(cf, &ctx);
12: d8d8e9e2aa3f ! 12: 366eb2afee83 chunk-format: create read chunk API
@@ Commit message
read. If the same struct instance was used for both reads and writes,
then there would be failures.
+ Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
## chunk-format.c ##
@@ chunk-format.c: int write_chunkfile(struct chunkfile *cf, void *data)
+ return 0;
+}
+
++static int pair_chunk_fn(const unsigned char *chunk_start,
++ size_t chunk_size,
++ void *data)
++{
++ const unsigned char **p = data;
++ *p = chunk_start;
++ return 0;
++}
++
+int pair_chunk(struct chunkfile *cf,
+ uint32_t chunk_id,
+ const unsigned char **p)
+{
-+ int i;
-+
-+ for (i = 0; i < cf->chunks_nr; i++) {
-+ if (cf->chunks[i].id == chunk_id) {
-+ *p = cf->chunks[i].start;
-+ return 0;
-+ }
-+ }
-+
-+ return CHUNK_NOT_FOUND;
++ return read_chunk(cf, chunk_id, pair_chunk_fn, p);
+}
+
+int read_chunk(struct chunkfile *cf,
@@ chunk-format.c: int write_chunkfile(struct chunkfile *cf, void *data)
+}
## chunk-format.h ##
+@@
+ struct hashfile;
+ struct chunkfile;
+
++/*
++ * Initialize a 'struct chunkfile' for writing _or_ reading a file
++ * with the chunk format.
++ *
++ * If writing a file, supply a non-NULL 'struct hashfile *' that will
++ * be used to write.
++ *
++ * If reading a file, then supply the memory-mapped data to the
++ * pair_chunk() or read_chunk() methods, as appropriate.
++ *
++ * DO NOT MIX THESE MODES. Use different 'struct chunkfile' instances
++ * for reading and writing.
++ */
+ struct chunkfile *init_chunkfile(struct hashfile *f);
+ void free_chunkfile(struct chunkfile *cf);
+ int get_num_chunks(struct chunkfile *cf);
@@ chunk-format.h: void add_chunk(struct chunkfile *cf,
- size_t size);
+ chunk_write_fn fn);
int write_chunkfile(struct chunkfile *cf, void *data);
+int read_table_of_contents(struct chunkfile *cf,
13: 8744d2785965 = 13: 7838ad32e2e0 commit-graph: use chunk-format read API
14: 750c03253c95 ! 14: 6bddd9e63b9b midx: use chunk-format read API
@@ midx.c: struct multi_pack_index *load_multi_pack_index(const char *object_dir, i
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+@@ midx.c: struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
+ cleanup_fail:
+ free(m);
+ free(midx_name);
++ free(cf);
+ if (midx_map)
+ munmap(midx_map, midx_size);
+ if (0 <= fd)
## t/t5319-multi-pack-index.sh ##
@@ t/t5319-multi-pack-index.sh: test_expect_success 'verify bad OID version' '
15: 83d292532a0f ! 15: 3cd97f389f1f midx: use 64-bit multiplication for chunk sizes
@@ Commit message
multiplication always. This allows us to properly predict the chunk
sizes without risk of overflow.
+ Other possible overflows were discovered by evaluating each
+ multiplication in midx.c and ensuring that at least one side of the
+ operator was of type size_t or off_t.
+
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
## midx.c ##
+@@ midx.c: static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+ const unsigned char *offset_data;
+ uint32_t offset32;
+
+- offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
++ offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH;
+ offset32 = get_be32(offset_data + sizeof(uint32_t));
+
+ if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
+@@ midx.c: static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+
+ static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
+ {
+- return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH);
++ return get_be32(m->chunk_object_offsets +
++ (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
+ }
+
+ static int nth_midxed_pack_entry(struct repository *r,
@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
- add_chunk(cf, MIDX_CHUNKID_OIDFANOUT,
- write_midx_oid_fanout, MIDX_CHUNK_FANOUT_SIZE);
+ add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
+ write_midx_oid_fanout);
add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
-- write_midx_oid_lookup, ctx.entries_nr * the_hash_algo->rawsz);
-+ write_midx_oid_lookup, (uint64_t)ctx.entries_nr * the_hash_algo->rawsz);
+- ctx.entries_nr * the_hash_algo->rawsz,
++ (size_t)ctx.entries_nr * the_hash_algo->rawsz,
+ write_midx_oid_lookup);
add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
- write_midx_object_offsets,
- ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH);
-@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
+- ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH,
++ (size_t)ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH,
+ write_midx_object_offsets);
+
if (ctx.large_offsets_needed)
add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
- write_midx_large_offsets,
-- ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH);
-+ (uint64_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH);
+- ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
++ (size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
+ write_midx_large_offsets);
write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
- write_chunkfile(cf, &ctx);
16: 669eeec707ab ! 16: b9a1bddf615f chunk-format: restore duplicate chunk checks
@@ Commit message
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
## chunk-format.c ##
-@@ chunk-format.c: struct chunk_info {
- chunk_write_fn write_fn;
-
- const void *start;
-+ unsigned found:1;
- };
-
- struct chunkfile {
@@ chunk-format.c: int read_table_of_contents(struct chunkfile *cf,
uint64_t toc_offset,
int toc_length)
17: 8f3985ab5df3 ! 17: 4c7d751f1e39 chunk-format: add technical docs
@@ Documentation/technical/chunk-format.txt (new)
+
+Functions for working with chunk-based file formats are declared in
+`chunk-format.h`. Using these methods provide extra checks that assist
-+developers when creating new file formats, including:
++developers when creating new file formats.
+
-+ 1. Writing and reading the table of contents.
++Writing chunk-based file formats
++--------------------------------
+
-+ 2. Verifying that the data written in a chunk matches the expected size
-+ that was recorded in the table of contents.
++To write a chunk-based file format, create a `struct chunkfile` by
++calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
++caller is responsible for opening the `hashfile` and writing header
++information so the file format is identifiable before the chunk-based
++format begins.
+
-+ 3. Checking that a table of contents describes offsets properly within
-+ the file boundaries.
++Then, call `add_chunk()` for each chunk that is intended for write. This
++populates the `chunkfile` with information about the order and size of
++each chunk to write. Provide a `chunk_write_fn` function pointer to
++perform the write of the chunk data upon request.
++
++Call `write_chunkfile()` to write the table of contents to the `hashfile`
++followed by each of the chunks. This will verify that each chunk wrote
++the expected amount of data so the table of contents is correct.
++
++Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
++caller is responsible for finalizing the `hashfile` by writing the trailing
++hash and closing the file.
++
++Reading chunk-based file formats
++--------------------------------
++
++To read a chunk-based file format, the file must be opened as a
++memory-mapped region. The chunk-format API expects that the entire file
++is mapped as a contiguous memory region.
++
++Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
++
++After reading the header information from the beginning of the file,
++including the chunk count, call `read_table_of_contents()` to populate
++the `struct chunkfile` with the list of chunks, their offsets, and their
++sizes.
++
++Extract the data information for each chunk using `pair_chunk()` or
++`read_chunk()`:
++
++* `pair_chunk()` assigns a given pointer with the location inside the
++ memory-mapped file corresponding to that chunk's offset. If the chunk
++ does not exist, then the pointer is not modified.
++
++* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
++ with the appropriate initial pointer and size information. The function
++ is not called if the chunk does not exist. Use this method to read chunks
++ if you need to perform immediate parsing or if you need to execute logic
++ based on the size of the chunk.
++
++After calling these methods, call `free_chunkfile()` to clear the
++`struct chunkfile` data. This will not close the memory-mapped region.
++Callers are expected to own that data for the timeframe the pointers into
++the region are needed.
++
++Examples
++--------
++
++These file formats use the chunk-format API, and can be used as examples
++for future formats:
++
++* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
++ in `commit-graph.c` for how the chunk-format API is used to write and
++ parse the commit-graph file format documented in
++ link:technical/commit-graph-format.html[the commit-graph file format].
++
++* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
++ in `midx.c` for how the chunk-format API is used to write and
++ parse the multi-pack-index file format documented in
++ link:technical/pack-format.html[the multi-pack-index file format].
## Documentation/technical/commit-graph-format.txt ##
@@ Documentation/technical/commit-graph-format.txt: CHUNK LOOKUP:
--
gitgitgadget
next prev parent reply other threads:[~2021-02-05 22:17 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-26 16:01 [PATCH 00/17] Refactor chunk-format into an API Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 01/17] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2021-01-27 1:53 ` Chris Torek
2021-01-27 2:36 ` Taylor Blau
2021-01-26 16:01 ` [PATCH 02/17] chunk-format: create chunk format write API Derrick Stolee via GitGitGadget
2021-01-27 2:42 ` Taylor Blau
2021-01-27 13:49 ` Derrick Stolee
2021-01-26 16:01 ` [PATCH 03/17] commit-graph: use chunk-format " Derrick Stolee via GitGitGadget
2021-01-27 2:47 ` Taylor Blau
2021-01-26 16:01 ` [PATCH 04/17] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2021-01-27 2:49 ` Taylor Blau
2021-01-26 16:01 ` [PATCH 05/17] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 06/17] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 07/17] midx: add pack_perm " Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 08/17] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 09/17] midx: return success/failure in chunk write methods Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 10/17] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 11/17] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 12/17] chunk-format: create read chunk API Derrick Stolee via GitGitGadget
2021-01-27 3:02 ` Taylor Blau
2021-01-26 16:01 ` [PATCH 13/17] commit-graph: use chunk-format read API Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 14/17] midx: " Derrick Stolee via GitGitGadget
2021-01-27 3:06 ` Taylor Blau
2021-01-27 13:50 ` Derrick Stolee
2021-01-26 16:01 ` [PATCH 15/17] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 16/17] chunk-format: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2021-01-26 16:01 ` [PATCH 17/17] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2021-01-26 22:37 ` [PATCH 00/17] Refactor chunk-format into an API Junio C Hamano
2021-01-27 2:29 ` Taylor Blau
2021-01-27 15:01 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 01/17] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 02/17] chunk-format: create chunk format write API Derrick Stolee via GitGitGadget
2021-02-04 21:24 ` Junio C Hamano
2021-02-04 22:40 ` Junio C Hamano
2021-02-05 11:37 ` Derrick Stolee
2021-02-05 19:25 ` Junio C Hamano
2021-01-27 15:01 ` [PATCH v2 03/17] commit-graph: use chunk-format " Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 04/17] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 05/17] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 06/17] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 07/17] midx: add pack_perm " Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 08/17] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 09/17] midx: return success/failure in chunk write methods Derrick Stolee via GitGitGadget
2021-02-04 22:59 ` Junio C Hamano
2021-02-05 11:42 ` Derrick Stolee
2021-01-27 15:01 ` [PATCH v2 10/17] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 11/17] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 12/17] chunk-format: create read chunk API Derrick Stolee via GitGitGadget
2021-02-04 23:40 ` Junio C Hamano
2021-02-05 12:19 ` Derrick Stolee
2021-02-05 19:37 ` Junio C Hamano
2021-02-08 22:26 ` Junio C Hamano
2021-02-09 1:33 ` Derrick Stolee
2021-02-09 20:47 ` Junio C Hamano
2021-01-27 15:01 ` [PATCH v2 13/17] commit-graph: use chunk-format read API Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 14/17] midx: " Derrick Stolee via GitGitGadget
2021-01-27 15:01 ` [PATCH v2 15/17] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2021-02-05 0:00 ` Junio C Hamano
2021-02-05 10:59 ` Chris Torek
2021-02-05 20:41 ` Junio C Hamano
2021-02-06 20:35 ` Chris Torek
2021-02-05 12:30 ` Derrick Stolee
2021-02-05 19:42 ` Junio C Hamano
2021-02-07 19:50 ` SZEDER Gábor
2021-02-08 5:41 ` Junio C Hamano
2021-01-27 15:01 ` [PATCH v2 16/17] chunk-format: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2021-02-05 0:05 ` Junio C Hamano
2021-02-05 12:31 ` Derrick Stolee
2021-01-27 15:01 ` [PATCH v2 17/17] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2021-02-05 0:15 ` Junio C Hamano
2021-01-27 16:03 ` [PATCH v2 00/17] Refactor chunk-format into an API Taylor Blau
2021-02-05 2:08 ` Junio C Hamano
2021-02-05 2:27 ` Derrick Stolee
2021-02-05 14:30 ` Derrick Stolee via GitGitGadget [this message]
2021-02-05 14:30 ` [PATCH v3 01/17] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 02/17] chunk-format: create chunk format write API Derrick Stolee via GitGitGadget
2021-02-07 21:13 ` SZEDER Gábor
2021-02-08 13:44 ` Derrick Stolee
2021-02-11 19:43 ` SZEDER Gábor
2021-02-05 14:30 ` [PATCH v3 03/17] commit-graph: use chunk-format " Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 04/17] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 05/17] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 06/17] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 07/17] midx: add pack_perm " Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 08/17] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 09/17] midx: return success/failure in chunk write methods Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 10/17] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 11/17] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 12/17] chunk-format: create read chunk API Derrick Stolee via GitGitGadget
2021-02-07 20:20 ` SZEDER Gábor
2021-02-08 13:35 ` Derrick Stolee
2021-02-05 14:30 ` [PATCH v3 13/17] commit-graph: use chunk-format read API Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 14/17] midx: " Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 15/17] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 16/17] chunk-format: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2021-02-05 14:30 ` [PATCH v3 17/17] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 00/17] Refactor chunk-format into an API Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 01/17] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 02/17] chunk-format: create chunk format write API Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 03/17] commit-graph: use chunk-format " Derrick Stolee via GitGitGadget
2021-02-24 16:52 ` SZEDER Gábor
2021-02-24 17:12 ` Taylor Blau
2021-02-24 17:52 ` Derrick Stolee
2021-02-24 19:44 ` Junio C Hamano
2021-02-18 14:07 ` [PATCH v4 04/17] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 05/17] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 06/17] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 07/17] midx: add pack_perm " Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 08/17] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 09/17] midx: return success/failure in chunk write methods Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 10/17] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 11/17] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 12/17] chunk-format: create read chunk API Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 13/17] commit-graph: use chunk-format read API Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 14/17] midx: " Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 15/17] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 16/17] chunk-format: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2021-02-18 14:07 ` [PATCH v4 17/17] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2021-02-18 21:47 ` Junio C Hamano
2021-02-19 12:42 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.848.v3.git.1612535452.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=chris.torek@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=me@ttaylorr.com \
--cc=stolee@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).