From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> Cc: git@vger.kernel.org, sandals@crustytoothpaste.net, Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net> Subject: Re: [PATCH v2 0/5] Create commit-graph file format v2 Date: Sat, 27 Apr 2019 14:57:50 +0200 Message-ID: <87zhobr4fl.fsf@evledraar.gmail.com> (raw) In-Reply-To: <87a7gdspo4.fsf@evledraar.gmail.com> On Fri, Apr 26 2019, Ævar Arnfjörð Bjarmason wrote: > ...speaking of which, digging up outstanding stuff I have on the > commit-graph I was reminded to finish up my "commit graph on clone" > patch in: > https://public-inbox.org/git/87in2hgzin.fsf@evledraar.gmail.com/ > > And re #1 above: I guess we could also do that "let's make a graph" and > call "gc --auto" if a) we have gc.writeCommitGraph b) we see it's not > the "right" version. As long as older versions always write a "old" one > if they can't grok the "new" one, and newer versions leave existing > graphs alone even if they're older versions, so we don't flip-flop. > > One of the things that would make that "graph on clone/fetch/whatever" > easier is having the graph store the total number of objects while it > was at it, you indicated in > https://public-inbox.org/git/934fa00e-f6df-c333-4968-3e9acffab22d@gmail.com/ > that you already have an internal MSFT implementation of it that does > it. > > Any reason not to make it part of v2 while we're at it? We already find > out how many (packed) objects we have in "add_packed_commits", we just > don't do anything with that information now. I hacked up this plus general tag/tree/blob count stats in the WIP/RFC patch below. I figured once I did objects I might as well do tags (note: annotated tag objects, not # tag refs)/trees/blobs as well. It passes all tests with GIT_TEST_COMMIT_GRAPH=true, it fails on the commit-graph's own test suite, but AFAICT only because the selective corruption tests are thrown off by the location of this new chunk. Since we now skip some commits found in the pack(s) (just duplicates?) the new "num_commits_stat" is not the same as the current "num_commits", but usually really close. It's probably best if we do something like this to make this chunk be of dynamic length, as long as we kept order we could keep adding new stats to the file even within the same "version". This as (ab)using the "commit-graph" to start storing arbitrary stats about stuff we find the the packs during gc. Maybe that sucks, but OTOH it's useful, and just having some new "last gc stats" format/file would be overkill... diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 1485b4daaf..d9378f23d9 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -114,7 +114,9 @@ static int graph_read(int argc, const char **argv) printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), - *(unsigned char*)(graph->data + 5), + (getenv("STAT_ME") + ? *(unsigned char*)(graph->data + 5) + : (*(unsigned char*)(graph->data + 5) - 1)), *(unsigned char*)(graph->data + 6), *(unsigned char*)(graph->data + 7)); @@ -123,8 +125,20 @@ static int graph_read(int argc, const char **argv) get_be32(graph->data + 8)); printf("num_commits: %u\n", graph->num_commits); + if (getenv("STAT_ME")) { + printf(" pack num commits (st): %u\n", graph->num_commits_stat); + printf(" pack num commits - inferred diff (diff = duplicate (I think!)): %u\n", graph->num_commits_stat - graph->num_commits); + printf(" pack num objects: %u\n", graph->num_objects); + printf(" pack num tags: %u\n", graph->num_tags); + printf(" pack num trees: %u\n", graph->num_trees); + printf(" pack num blobs: %u\n", graph->num_blobs); + } printf("chunks:"); + if (getenv("STAT_ME")) { + if (graph->chunk_oid_numbers) + printf(" oid_numbers"); + } if (graph->chunk_oid_fanout) printf(" oid_fanout"); if (graph->chunk_oid_lookup) diff --git a/commit-graph.c b/commit-graph.c index 14d6aebd99..3d0fb5193b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -18,6 +18,7 @@ #include "progress.h" #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ +#define GRAPH_CHUNKID_OIDNUMBERS 0x4f49444e /* "OIDN" */ #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ @@ -32,6 +33,7 @@ #define GRAPH_LAST_EDGE 0x80000000 #define GRAPH_HEADER_SIZE 8 +#define GRAPH_OIDNUMBERS_SIZE (4 * 5) #define GRAPH_FANOUT_SIZE (4 * 256) #define GRAPH_CHUNKLOOKUP_WIDTH 12 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ @@ -127,6 +129,10 @@ static int verify_commit_graph_lite(struct commit_graph *g) * over g->num_commits, or runs a checksum on the commit-graph * itself. */ + if (!g->chunk_oid_numbers) { + error("commit-graph is missing the OID Numbers chunk"); + return 1; + } if (!g->chunk_oid_fanout) { error("commit-graph is missing the OID Fanout chunk"); return 1; @@ -249,6 +255,18 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, } switch (chunk_id) { + case GRAPH_CHUNKID_OIDNUMBERS: + if (graph->chunk_oid_numbers) { + chunk_repeated = 1; + } else { + graph->chunk_oid_numbers = data + chunk_offset; + graph->num_objects = get_be32(graph->chunk_oid_numbers + 0); + graph->num_commits_stat = get_be32(graph->chunk_oid_numbers + 4); + graph->num_tags = get_be32(graph->chunk_oid_numbers + 8); + graph->num_trees = get_be32(graph->chunk_oid_numbers + 12); + graph->num_blobs = get_be32(graph->chunk_oid_numbers + 16); + } + break; case GRAPH_CHUNKID_OIDFANOUT: if (graph->chunk_oid_fanout) chunk_repeated = 1; @@ -545,6 +563,22 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } +static void write_graph_chunk_numbers(struct hashfile *f, + struct progress *progress, + uint64_t *progress_cnt, + uint32_t num_objects, + uint32_t num_commits, + uint32_t num_tags, + uint32_t num_trees, + uint32_t num_blobs) +{ + hashwrite_be32(f, num_objects); + hashwrite_be32(f, num_commits); + hashwrite_be32(f, num_tags); + hashwrite_be32(f, num_trees); + hashwrite_be32(f, num_blobs); +} + static void write_graph_chunk_fanout(struct hashfile *f, struct commit **commits, int nr_commits, @@ -567,7 +601,6 @@ static void write_graph_chunk_fanout(struct hashfile *f, count++; list++; } - hashwrite_be32(f, count); } } @@ -732,6 +765,11 @@ struct packed_oid_list { int alloc; struct progress *progress; int progress_done; + uint32_t num_objects; + uint32_t num_commits; + uint32_t num_tags; + uint32_t num_trees; + uint32_t num_blobs; }; static int add_packed_commits(const struct object_id *oid, @@ -751,6 +789,21 @@ static int add_packed_commits(const struct object_id *oid, if (packed_object_info(the_repository, pack, offset, &oi) < 0) die(_("unable to get type of object %s"), oid_to_hex(oid)); + /* + * Aggregate object statistics + */ + list->num_objects++; + if (type == OBJ_COMMIT) + list->num_commits++; + else if (type == OBJ_TAG) + list->num_tags++; + else if (type == OBJ_TREE) + list->num_trees++; + else if (type == OBJ_BLOB) + list->num_blobs++; + else + BUG("should not encounter internal-only object_type %d value here!", type); + if (type != OBJ_COMMIT) return 0; @@ -939,6 +992,8 @@ int write_commit_graph(const char *obj_dir, oids.progress = NULL; oids.progress_done = 0; commits.list = NULL; + oids.num_objects = oids.num_commits = oids.num_tags = + oids.num_trees = oids.num_blobs = 0; if (append) { prepare_commit_graph_one(the_repository, obj_dir); @@ -1092,7 +1147,7 @@ int write_commit_graph(const char *obj_dir, commits.nr++; } - num_chunks = num_extra_edges ? 4 : 3; + num_chunks = num_extra_edges ? 5 : 4; stop_progress(&progress); if (commits.nr >= GRAPH_EDGE_LAST_MASK) { @@ -1136,20 +1191,22 @@ int write_commit_graph(const char *obj_dir, break; } - chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; - chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; - chunk_ids[2] = GRAPH_CHUNKID_DATA; + chunk_ids[0] = GRAPH_CHUNKID_OIDNUMBERS; + chunk_ids[1] = GRAPH_CHUNKID_OIDFANOUT; + chunk_ids[2] = GRAPH_CHUNKID_OIDLOOKUP; + chunk_ids[3] = GRAPH_CHUNKID_DATA; if (num_extra_edges) - chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES; + chunk_ids[4] = GRAPH_CHUNKID_EXTRAEDGES; else - chunk_ids[3] = 0; - chunk_ids[4] = 0; + chunk_ids[4] = 0; + chunk_ids[5] = 0; chunk_offsets[0] = header_size + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; - chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; - chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr; - chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr; - chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges; + chunk_offsets[1] = chunk_offsets[0] + GRAPH_OIDNUMBERS_SIZE; + chunk_offsets[2] = chunk_offsets[1] + GRAPH_FANOUT_SIZE; + chunk_offsets[3] = chunk_offsets[2] + hashsz * commits.nr; + chunk_offsets[4] = chunk_offsets[3] + (hashsz + 16) * commits.nr; + chunk_offsets[5] = chunk_offsets[4] + 4 * num_extra_edges; for (i = 0; i <= num_chunks; i++) { uint32_t chunk_write[3]; @@ -1170,6 +1227,10 @@ int write_commit_graph(const char *obj_dir, progress_title.buf, num_chunks * commits.nr); } + write_graph_chunk_numbers(f, progress, &progress_cnt, + oids.num_objects, oids.num_commits, + oids.num_tags, oids.num_trees, + oids.num_blobs); write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt); write_graph_chunk_oids(f, hashsz, commits.list, commits.nr, progress, &progress_cnt); write_graph_chunk_data(f, hashsz, commits.list, commits.nr, progress, &progress_cnt); diff --git a/commit-graph.h b/commit-graph.h index 2c461770e8..ef9eb0b6cb 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -45,10 +45,16 @@ struct commit_graph { unsigned char hash_len; unsigned char num_chunks; + uint32_t num_objects; uint32_t num_commits; + uint32_t num_commits_stat; + uint32_t num_tags; + uint32_t num_trees; + uint32_t num_blobs; struct object_id oid; const uint32_t *chunk_oid_fanout; + const unsigned char *chunk_oid_numbers; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_commit_data; const unsigned char *chunk_extra_edges; Opel Vivaro
next prev parent reply other threads:[~2019-04-27 12:57 UTC|newest] Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-01-23 21:59 [PATCH 0/6] " Derrick Stolee via GitGitGadget 2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget 2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget 2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget 2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget 2019-01-24 9:31 ` Ævar Arnfjörð Bjarmason 2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget 2019-01-23 23:56 ` Jonathan Tan 2019-01-24 9:40 ` Ævar Arnfjörð Bjarmason 2019-01-24 14:34 ` Derrick Stolee 2019-03-21 9:21 ` Ævar Arnfjörð Bjarmason 2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget 2019-01-23 23:59 ` Jonathan Tan 2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano 2019-01-24 23:39 ` Junio C Hamano 2019-01-25 13:54 ` Derrick Stolee 2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget 2019-04-24 19:58 ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget 2019-04-24 19:58 ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget 2019-04-25 5:21 ` Junio C Hamano 2019-04-24 19:58 ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget 2019-04-25 5:29 ` Junio C Hamano 2019-04-25 11:09 ` Derrick Stolee 2019-04-25 21:31 ` Ævar Arnfjörð Bjarmason 2019-04-26 2:20 ` Junio C Hamano 2019-04-24 19:58 ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget 2019-04-24 19:58 ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget 2019-04-25 22:09 ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason 2019-04-26 2:28 ` Junio C Hamano 2019-04-26 8:33 ` Ævar Arnfjörð Bjarmason 2019-04-26 12:06 ` Derrick Stolee 2019-04-26 13:55 ` Ævar Arnfjörð Bjarmason 2019-04-27 12:57 ` Ævar Arnfjörð Bjarmason [this message] 2019-05-01 13:11 ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget 2019-05-01 13:11 ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget 2019-05-01 14:46 ` Ævar Arnfjörð Bjarmason 2019-05-01 13:11 ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget 2019-05-01 13:11 ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget 2019-05-01 13:11 ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget 2019-05-01 13:11 ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget 2019-05-01 19:12 ` Ævar Arnfjörð Bjarmason 2019-05-01 19:56 ` Derrick Stolee 2019-05-01 13:11 ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget 2019-05-01 14:58 ` Ævar Arnfjörð Bjarmason 2019-05-01 19:59 ` Derrick Stolee 2019-05-01 20:25 ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason 2019-05-02 13:26 ` Derrick Stolee 2019-05-02 18:02 ` Ævar Arnfjörð Bjarmason 2019-05-03 12:47 ` Derrick Stolee 2019-05-03 13:41 ` Ævar Arnfjörð Bjarmason 2019-05-06 8:27 ` Christian Couder 2019-05-06 13:47 ` Derrick Stolee 2019-05-03 14:16 ` SZEDER Gábor 2019-05-03 15:11 ` Derrick Stolee 2019-05-09 14:22 ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget 2019-05-13 2:56 ` Junio C Hamano 2019-05-09 14:22 ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget 2019-05-13 3:13 ` Junio C Hamano 2019-05-13 11:04 ` Derrick Stolee 2019-05-13 11:22 ` Derrick Stolee 2019-05-09 14:22 ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget 2019-05-13 3:44 ` Junio C Hamano 2019-05-13 11:07 ` Derrick Stolee 2019-05-09 14:22 ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget 2019-05-13 5:05 ` Junio C Hamano 2019-05-09 14:22 ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget 2019-05-09 14:22 ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget 2019-05-13 5:09 ` Junio C Hamano 2019-05-09 17:58 ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon 2019-06-12 13:29 ` [PATCH v5 " Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget 2019-06-29 17:23 ` SZEDER Gábor 2019-07-01 12:19 ` Derrick Stolee 2019-06-12 13:29 ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget 2019-06-12 13:29 ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87zhobr4fl.fsf@evledraar.gmail.com \ --to=avarab@gmail.com \ --cc=git@vger.kernel.org \ --cc=gitgitgadget@gmail.com \ --cc=gitster@pobox.com \ --cc=peff@peff.net \ --cc=sandals@crustytoothpaste.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git