git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Will Chandler <wfc@wfchandler.org>
To: derrickstolee@github.com
Cc: git@vger.kernel.org, peff@peff.net, ps@pks.im
Subject: Re: commit-graph overflow generation chicken and egg
Date: Tue, 05 Jul 2022 17:03:44 -0400	[thread overview]
Message-ID: <DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org> (raw)
In-Reply-To: <a154e109-3f4c-c500-3365-d47879abf30d@github.com>

Hi all,

I think I've reproduced this problem on v2.36.1 and main.

First, remove any existing commit graphs:

  $ rm -rf .git/objects/info/commit-graph*

Generate a commit graph without GDA2:

  $ git -c commitGraph.generationVersion=1 commit-graph write

Peff's bad commit graph included bloom filters, so let's update the
commit graph to include them and add GDA2:

  $ git -c commitGraph.generationVersion=2 commit-graph write --changed-paths

The commit graph is now broken:

  $ git rev-list --all
  fatal: commit-graph requires overflow generation data but has none

The header has a GDA2 section, but no GDO2:

  $ xxd .git/objects/info/commit-graphs/*graph | head
  00000000: 4347 5048 0101 0700 4f49 4446 0000 0000  CGPH....OIDF....
  00000010: 0000 0068 4f49 444c 0000 0000 0000 0468  ...hOIDL.......h
  00000020: 4344 4154 0000 0000 0015 43b4 4744 4132  CDAT......C.GDA2
  00000030: 0000 0000 003b 8270 4544 4745 0000 0000  .....;.pEDGE....
  00000040: 003f c24c 4249 4458 0000 0000 003f c3cc  .?.LBIDX.....?..
  00000050: 4244 4154 0000 0000 0044 03a8 0000 0000  BDAT.....D......
  00000060: 0000 0000 0049 79df 0000 010e 0000 0216  .....Iy.........
  00000070: 0000 031f 0000 042f 0000 0535 0000 0623  ......./...5...#
  00000080: 0000 0730 0000 086d 0000 096d 0000 0a8c  ...0...m...m....
  00000090: 0000 0b8e 0000 0c96 0000 0d9c 0000 0ea2  ................

Viewing the GDA2 block, the overflow bit is set for virtually all of the commits:

  $ xxd -s 0x003b8270 .git/objects/info/commit-graphs/*.graph | head
  003b8270: 8000 0000 8000 0001 8000 0002 8000 0003  ................
  003b8280: 8000 0004 8000 0005 8000 0006 8000 0007  ................
  003b8290: 8000 0008 8000 0009 8000 000a 8000 000b  ................
  003b82a0: 8000 000c 8000 000d 8000 000e 8000 000f  ................
  003b82b0: 8000 0010 8000 0011 8000 0012 8000 0013  ................
  003b82c0: 8000 0014 8000 0015 8000 0016 8000 0017  ................
  003b82d0: 8000 0018 8000 0019 8000 001a 8000 001b  ................
  003b82e0: 8000 001c 8000 001d 8000 001e 8000 001f  ................
  003b82f0: 8000 0020 8000 0021 8000 0022 8000 0023  ... ...!..."...#
  003b8300: 8000 0024 8000 0025 8000 0026 8000 0027  ...$...%...&...'

Stepping though, it looks like this is the scenario:

compute_generation_numbers correctly calculates the generation, e.g.:

  {
    graph_pos = 0,
    generation = 1502820840
  }

Because the existing commit graph did not include generation data,
graph.read_generation_data is 0. When compute_bloom_filters executes, it
will call fill_commit_graph_info which checks that value and falls back
to the older generation calculation if false:

  if (g->read_generation_data) {
  	offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);

  	if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) {
  		if (!g->chunk_generation_data_overflow)
  			die(_("commit-graph requires overflow generation data but has none"));

  		offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW;
  		graph_data->generation = item->date + get_be64(g->chunk_generation_data_overflow + 8 * offset_pos);
  	} else
  		graph_data->generation = item->date + offset;
  } else
  	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;

This re-writes the commit data to:

  {
    graph_pos = 0,
    generation = 17631
  }

The smaller generation value underflows when the commit date is
subtracted from it by write_graph_chunk_generation_data, causing the
overflow flag to be set:

  static int write_graph_chunk_generation_data(struct hashfile *f,
					     void *data)
  {
  	struct write_commit_graph_context *ctx = data;
  	int i, num_generation_data_overflows = 0;

  	for (i = 0; i < ctx->commits.nr; i++) {
  		struct commit *c = ctx->commits.list[i];
  		timestamp_t offset;
  		repo_parse_commit(ctx->r, c);
  		offset = commit_graph_data_at(c)->generation - c->date;
  		display_progress(ctx->progress, ++ctx->progress_cnt);

  		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
  			offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
  			num_generation_data_overflows++;
  		}

  		hashwrite_be32(f, offset);
  	}

  	return 0;
  }

We've already decided by this point to skip writing the overflow chunk
based on compute_generation_numbers, so the commit graph is now
invalid.

  reply	other threads:[~2022-07-05 21:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 19:33 commit-graph overflow generation chicken and egg Jeff King
2022-06-08 20:08 ` Derrick Stolee
2022-06-08 23:17   ` Jeff King
2022-07-01 12:06     ` Patrick Steinhardt
2022-07-04 10:46       ` Patrick Steinhardt
2022-07-04 20:50         ` Derrick Stolee
2022-07-05 21:03           ` Will Chandler [this message]
2022-07-05 22:28             ` Taylor Blau
2022-07-06  8:52               ` Jeff King
2022-07-06  9:11           ` Jeff King
2022-06-09  7:49   ` Ævar Arnfjörð Bjarmason
2022-06-09 15:26     ` Jeff King
2022-06-09 15:39       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DD88D523-0ECA-4474-9AA5-1D4A431E532A@wfchandler.org \
    --to=wfc@wfchandler.org \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).