From: Will Chandler <wfc@wfchandler.org>
To: derrickstolee@github.com
Cc: git@vger.kernel.org, peff@peff.net, ps@pks.im
Subject: Re: commit-graph overflow generation chicken and egg
Date: Tue, 05 Jul 2022 17:03:44 -0400 [thread overview]
Message-ID: <DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org> (raw)
In-Reply-To: <a154e109-3f4c-c500-3365-d47879abf30d@github.com>
Hi all,
I think I've reproduced this problem on v2.36.1 and main.
First, remove any existing commit graphs:
$ rm -rf .git/objects/info/commit-graph*
Generate a commit graph without GDA2:
$ git -c commitGraph.generationVersion=1 commit-graph write
Peff's bad commit graph included bloom filters, so let's update the
commit graph to include them and add GDA2:
$ git -c commitGraph.generationVersion=2 commit-graph write --changed-paths
The commit graph is now broken:
$ git rev-list --all
fatal: commit-graph requires overflow generation data but has none
The header has a GDA2 section, but no GDO2:
$ xxd .git/objects/info/commit-graphs/*graph | head
00000000: 4347 5048 0101 0700 4f49 4446 0000 0000 CGPH....OIDF....
00000010: 0000 0068 4f49 444c 0000 0000 0000 0468 ...hOIDL.......h
00000020: 4344 4154 0000 0000 0015 43b4 4744 4132 CDAT......C.GDA2
00000030: 0000 0000 003b 8270 4544 4745 0000 0000 .....;.pEDGE....
00000040: 003f c24c 4249 4458 0000 0000 003f c3cc .?.LBIDX.....?..
00000050: 4244 4154 0000 0000 0044 03a8 0000 0000 BDAT.....D......
00000060: 0000 0000 0049 79df 0000 010e 0000 0216 .....Iy.........
00000070: 0000 031f 0000 042f 0000 0535 0000 0623 ......./...5...#
00000080: 0000 0730 0000 086d 0000 096d 0000 0a8c ...0...m...m....
00000090: 0000 0b8e 0000 0c96 0000 0d9c 0000 0ea2 ................
Viewing the GDA2 block, the overflow bit is set for virtually all of the commits:
$ xxd -s 0x003b8270 .git/objects/info/commit-graphs/*.graph | head
003b8270: 8000 0000 8000 0001 8000 0002 8000 0003 ................
003b8280: 8000 0004 8000 0005 8000 0006 8000 0007 ................
003b8290: 8000 0008 8000 0009 8000 000a 8000 000b ................
003b82a0: 8000 000c 8000 000d 8000 000e 8000 000f ................
003b82b0: 8000 0010 8000 0011 8000 0012 8000 0013 ................
003b82c0: 8000 0014 8000 0015 8000 0016 8000 0017 ................
003b82d0: 8000 0018 8000 0019 8000 001a 8000 001b ................
003b82e0: 8000 001c 8000 001d 8000 001e 8000 001f ................
003b82f0: 8000 0020 8000 0021 8000 0022 8000 0023 ... ...!..."...#
003b8300: 8000 0024 8000 0025 8000 0026 8000 0027 ...$...%...&...'
Stepping though, it looks like this is the scenario:
compute_generation_numbers correctly calculates the generation, e.g.:
{
graph_pos = 0,
generation = 1502820840
}
Because the existing commit graph did not include generation data,
graph.read_generation_data is 0. When compute_bloom_filters executes, it
will call fill_commit_graph_info which checks that value and falls back
to the older generation calculation if false:
if (g->read_generation_data) {
offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) {
if (!g->chunk_generation_data_overflow)
die(_("commit-graph requires overflow generation data but has none"));
offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW;
graph_data->generation = item->date + get_be64(g->chunk_generation_data_overflow + 8 * offset_pos);
} else
graph_data->generation = item->date + offset;
} else
graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
This re-writes the commit data to:
{
graph_pos = 0,
generation = 17631
}
The smaller generation value underflows when the commit date is
subtracted from it by write_graph_chunk_generation_data, causing the
overflow flag to be set:
static int write_graph_chunk_generation_data(struct hashfile *f,
void *data)
{
struct write_commit_graph_context *ctx = data;
int i, num_generation_data_overflows = 0;
for (i = 0; i < ctx->commits.nr; i++) {
struct commit *c = ctx->commits.list[i];
timestamp_t offset;
repo_parse_commit(ctx->r, c);
offset = commit_graph_data_at(c)->generation - c->date;
display_progress(ctx->progress, ++ctx->progress_cnt);
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
num_generation_data_overflows++;
}
hashwrite_be32(f, offset);
}
return 0;
}
We've already decided by this point to skip writing the overflow chunk
based on compute_generation_numbers, so the commit graph is now
invalid.
next prev parent reply other threads:[~2022-07-05 21:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-08 19:33 commit-graph overflow generation chicken and egg Jeff King
2022-06-08 20:08 ` Derrick Stolee
2022-06-08 23:17 ` Jeff King
2022-07-01 12:06 ` Patrick Steinhardt
2022-07-04 10:46 ` Patrick Steinhardt
2022-07-04 20:50 ` Derrick Stolee
2022-07-05 21:03 ` Will Chandler [this message]
2022-07-05 22:28 ` Taylor Blau
2022-07-06 8:52 ` Jeff King
2022-07-06 9:11 ` Jeff King
2022-06-09 7:49 ` Ævar Arnfjörð Bjarmason
2022-06-09 15:26 ` Jeff King
2022-06-09 15:39 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org \
--to=wfc@wfchandler.org \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).