git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Commit graph chains with no corresponding files?
@ 2020-06-29 22:07 Jonathan Tan
  2020-06-30  1:51 ` Derrick Stolee
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Tan @ 2020-06-29 22:07 UTC (permalink / raw)
  Cc: git, stolee, Jonathan Tan

At $DAYJOB, a few people have reported "warning: unable to find all
commit-graph files" warnings. Their commit-graph-chain files have a few
lines, but they only have one commit graph file with very few commits. I
suspected something happening during fetch, because (as far as I know) a
fetch may cause an incremental commit graph to be written, but I ran a
fetch on a large repository myself and didn't run into this problem.

Has anyone ran into this problem before, and know how to reproduce?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit graph chains with no corresponding files?
  2020-06-29 22:07 Commit graph chains with no corresponding files? Jonathan Tan
@ 2020-06-30  1:51 ` Derrick Stolee
  2020-07-16 22:57   ` [FYI] commit-graph: trace expiry of commit graph links Jonathan Tan
  2021-02-25  4:54   ` Commit graph chains with no corresponding files? Bryan Turner
  0 siblings, 2 replies; 6+ messages in thread
From: Derrick Stolee @ 2020-06-30  1:51 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 6/29/2020 6:07 PM, Jonathan Tan wrote:
> At $DAYJOB, a few people have reported "warning: unable to find all
> commit-graph files" warnings. Their commit-graph-chain files have a few
> lines, but they only have one commit graph file with very few commits. I
> suspected something happening during fetch, because (as far as I know) a
> fetch may cause an incremental commit graph to be written, but I ran a
> fetch on a large repository myself and didn't run into this problem.
> 
> Has anyone ran into this problem before, and know how to reproduce?

The incremental commit-graph code deletes any commit-graph files
that do not appear in the chain. I believe this is done by comparing
the contents of the ".git/objects/info/commit-graphs/" directory to
the contents of the chain file.

These appear to be case-sensitive, full-path comparisons.

It is _possible_ that something like a case switch or a symlink
could be causing a problem here. That's where I would look on
the affected systems.

Likely the full-path comparison in expire_commit_graphs() should
be dropped in favor of local filename comparisons. A case-
sensitive match is less likely to be important here since Git
is writing the paths itself and should get the proper case back
from the directory listing.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [FYI] commit-graph: trace expiry of commit graph links
  2020-06-30  1:51 ` Derrick Stolee
@ 2020-07-16 22:57   ` Jonathan Tan
  2021-02-25  4:54   ` Commit graph chains with no corresponding files? Bryan Turner
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Tan @ 2020-07-16 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, stolee

When an obsolete link in the commit graph chain is deleted, and trace2
is enabled, trace a message that it is deleted, along with the list of
links before and after the current chain refresh.

The messages are emitted using trace2_data_string() and not
trace2_printf() because the latter is not implemented for some trace
targets.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This patch is just for informational purposes for people who have the
same problem and/or want to help diagnose.

I've tested that this produces the expected trace2 messages using:

  git commit-graph write --reachable --split=replace

> It is _possible_ that something like a case switch or a symlink
> could be causing a problem here. That's where I would look on
> the affected systems.

Indeed a symlink is present - the affected repositories are using the
git-repo [1] local mirror feature, which causes .git/objects (among
other things) to be a symlink - but I couldn't figure out how this
symlink would cause problems. In particular, looking at the code, all
relevant filenames seem to be constructed from ctx->odb->path, so no
matter which names were used to get to the object directory, all names
are built from those names, and this seems to be true in practice as
well.

So I've added some trace2 messages (in this patch), and let's see if I
can figure out what's going on. I'll write back if I find something.

[1] https://gerrit.googlesource.com/git-repo
---
 commit-graph.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 328ab06fd4..b5bd2ac6de 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2009,6 +2009,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	struct dirent *de;
 	size_t dirnamelen;
 	timestamp_t expire_time = time(NULL);
+	int commit_graph_deleted = 0;
 
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
@@ -2050,8 +2051,38 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 			}
 		}
 
-		if (!found)
+		if (!found) {
+			if (trace2_is_enabled()) {
+				struct strbuf message = STRBUF_INIT;
+
+				strbuf_addf(&message, "Deleting '%s' because it is not in [", path.buf);
+				for (i = 0; i < ctx->num_commit_graphs_after; i++) {
+					if (i != 0)
+						strbuf_addstr(&message, ", ");
+					strbuf_addf(&message, "'%s'", ctx->commit_graph_filenames_after[i]);
+				}
+				strbuf_addstr(&message, "]");
+				trace2_data_string("commit-graph", the_repository, "graph-deletion", message.buf);
+				strbuf_release(&message);
+				commit_graph_deleted = 1;
+			}
 			unlink(path.buf);
+		}
+	}
+
+	if (commit_graph_deleted) {
+		struct strbuf message = STRBUF_INIT;
+		uint32_t i;
+
+		strbuf_addstr(&message, "The commit graphs before were [");
+		for (i = 0; i < ctx->num_commit_graphs_before; i++) {
+			if (i != 0)
+				strbuf_addstr(&message, ", ");
+			strbuf_addf(&message, "'%s'", ctx->commit_graph_filenames_before[i]);
+		}
+		strbuf_addstr(&message, "]");
+		trace2_data_string("commit-graph", the_repository, "graph-before", message.buf);
+		strbuf_release(&message);
 	}
 
 out:
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit graph chains with no corresponding files?
  2020-06-30  1:51 ` Derrick Stolee
  2020-07-16 22:57   ` [FYI] commit-graph: trace expiry of commit graph links Jonathan Tan
@ 2021-02-25  4:54   ` Bryan Turner
  2021-02-25 14:20     ` Derrick Stolee
  1 sibling, 1 reply; 6+ messages in thread
From: Bryan Turner @ 2021-02-25  4:54 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jonathan Tan, Git Users

On Mon, Jun 29, 2020 at 6:51 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/29/2020 6:07 PM, Jonathan Tan wrote:
> > At $DAYJOB, a few people have reported "warning: unable to find all
> > commit-graph files" warnings. Their commit-graph-chain files have a few
> > lines, but they only have one commit graph file with very few commits. I
> > suspected something happening during fetch, because (as far as I know) a
> > fetch may cause an incremental commit graph to be written, but I ran a
> > fetch on a large repository myself and didn't run into this problem.
> >
> > Has anyone ran into this problem before, and know how to reproduce?

I don't have any specific reproduction steps, but we've just run into
our first case of this on Git 2.29. I ended up kicking off a full `git
commit-graph write` to fix it. That displayed the same warning, but
commands run after it no longer do. Prior to writing the new graph, I
had this:
$ ls
commit-graph-chain  graph-88f5fe6e0c659e3742e556982263813d528ead81.graph

Afterward, the `objects/info/commits-graphs` directory still exists
but is empty, and there is now an `objects/commit-graph` that didn't
exist before. `git commit-graph verify` seems happy with the state of
things.

>
> The incremental commit-graph code deletes any commit-graph files
> that do not appear in the chain. I believe this is done by comparing
> the contents of the ".git/objects/info/commit-graphs/" directory to
> the contents of the chain file.
>
> These appear to be case-sensitive, full-path comparisons.
>
> It is _possible_ that something like a case switch or a symlink
> could be causing a problem here. That's where I would look on
> the affected systems.

Are commit graphs potentially problematic in repositories that are
borrowing objects from other repositories via alternates? Have there
been important changes to commit graphs since 2.29?

>
> Likely the full-path comparison in expire_commit_graphs() should
> be dropped in favor of local filename comparisons. A case-
> sensitive match is less likely to be important here since Git
> is writing the paths itself and should get the proper case back
> from the directory listing.
>
> Thanks,
> -Stolee
>

-b

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit graph chains with no corresponding files?
  2021-02-25  4:54   ` Commit graph chains with no corresponding files? Bryan Turner
@ 2021-02-25 14:20     ` Derrick Stolee
  2021-02-27  2:49       ` Bryan Turner
  0 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee @ 2021-02-25 14:20 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Jonathan Tan, Git Users

On 2/24/2021 11:54 PM, Bryan Turner wrote:
> On Mon, Jun 29, 2020 at 6:51 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/29/2020 6:07 PM, Jonathan Tan wrote:
>>> At $DAYJOB, a few people have reported "warning: unable to find all
>>> commit-graph files" warnings. Their commit-graph-chain files have a few
>>> lines, but they only have one commit graph file with very few commits. I
>>> suspected something happening during fetch, because (as far as I know) a
>>> fetch may cause an incremental commit graph to be written, but I ran a
>>> fetch on a large repository myself and didn't run into this problem.
>>>
>>> Has anyone ran into this problem before, and know how to reproduce?
> 
> I don't have any specific reproduction steps, but we've just run into
> our first case of this on Git 2.29. I ended up kicking off a full `git
> commit-graph write` to fix it. That displayed the same warning, but
> commands run after it no longer do. Prior to writing the new graph, I
> had this:
> $ ls
> commit-graph-chain  graph-88f5fe6e0c659e3742e556982263813d528ead81.graph

The contents of the 'commit-graph-chain' file are critical to diagnosing
the problems here. Likely it had multiple lines.

> Afterward, the `objects/info/commits-graphs` directory still exists
> but is empty, and there is now an `objects/commit-graph` that didn't
> exist before. `git commit-graph verify` seems happy with the state of
> things.

Yes, a full rewrite without "--split" will get you to this state.

>> The incremental commit-graph code deletes any commit-graph files
>> that do not appear in the chain. I believe this is done by comparing
>> the contents of the ".git/objects/info/commit-graphs/" directory to
>> the contents of the chain file.
>>
>> These appear to be case-sensitive, full-path comparisons.
>>
>> It is _possible_ that something like a case switch or a symlink
>> could be causing a problem here. That's where I would look on
>> the affected systems.
> 
> Are commit graphs potentially problematic in repositories that are
> borrowing objects from other repositories via alternates?

This was definitely part of the design, with the intention of
working with a common base in the alternate. However, if the
alternate collapses layers, then the repo that is borrowing
from that alternate may have a broken chain.

It is likely a better setup to have the alternate keep a
commit-graph file and leave the dependent repos clear of a
commit-graph. _Or_ the dependent repos should use a full
commit-graph instead of a chain.

If you have a better idea for how to make this work, then there
is room for improvement.

For example, if we ensure during the commit-graph write that
all layers of the chain are within our local repo, then these
dependency issues go away without breaking any old Git versions
that are reading the data.

> Have there
> been important changes to commit graphs since 2.29?

Not in the area of commit-graph chains.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit graph chains with no corresponding files?
  2021-02-25 14:20     ` Derrick Stolee
@ 2021-02-27  2:49       ` Bryan Turner
  0 siblings, 0 replies; 6+ messages in thread
From: Bryan Turner @ 2021-02-27  2:49 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jonathan Tan, Git Users

On Thu, Feb 25, 2021 at 6:20 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 2/24/2021 11:54 PM, Bryan Turner wrote:
> > On Mon, Jun 29, 2020 at 6:51 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 6/29/2020 6:07 PM, Jonathan Tan wrote:
> >>> At $DAYJOB, a few people have reported "warning: unable to find all
> >>> commit-graph files" warnings. Their commit-graph-chain files have a few
> >>> lines, but they only have one commit graph file with very few commits. I
> >>> suspected something happening during fetch, because (as far as I know) a
> >>> fetch may cause an incremental commit graph to be written, but I ran a
> >>> fetch on a large repository myself and didn't run into this problem.
> >>>
> >>> Has anyone ran into this problem before, and know how to reproduce?
> >
> > I don't have any specific reproduction steps, but we've just run into
> > our first case of this on Git 2.29. I ended up kicking off a full `git
> > commit-graph write` to fix it. That displayed the same warning, but
> > commands run after it no longer do. Prior to writing the new graph, I
> > had this:
> > $ ls
> > commit-graph-chain  graph-88f5fe6e0c659e3742e556982263813d528ead81.graph
>
> The contents of the 'commit-graph-chain' file are critical to diagnosing
> the problems here. Likely it had multiple lines.
>
> > Afterward, the `objects/info/commits-graphs` directory still exists
> > but is empty, and there is now an `objects/commit-graph` that didn't
> > exist before. `git commit-graph verify` seems happy with the state of
> > things.
>
> Yes, a full rewrite without "--split" will get you to this state.
>
> >> The incremental commit-graph code deletes any commit-graph files
> >> that do not appear in the chain. I believe this is done by comparing
> >> the contents of the ".git/objects/info/commit-graphs/" directory to
> >> the contents of the chain file.
> >>
> >> These appear to be case-sensitive, full-path comparisons.
> >>
> >> It is _possible_ that something like a case switch or a symlink
> >> could be causing a problem here. That's where I would look on
> >> the affected systems.
> >
> > Are commit graphs potentially problematic in repositories that are
> > borrowing objects from other repositories via alternates?
>
> This was definitely part of the design, with the intention of
> working with a common base in the alternate. However, if the
> alternate collapses layers, then the repo that is borrowing
> from that alternate may have a broken chain.

Thanks for the analysis, Derrick. This seems like a likely culprit for
how the repository got into this state, because it is a fork (of a
fork) and does use a series of alternates.

>
> It is likely a better setup to have the alternate keep a
> commit-graph file and leave the dependent repos clear of a
> commit-graph. _Or_ the dependent repos should use a full
> commit-graph instead of a chain.

Skipping writing the commit graph in forks seems like a reasonable
place for us to start, given the way it currently works, but always
writing full graphs may be another option. If the fork is able to
borrow the commit graph from its origin across the alternate, though,
then that implies there may not be a lot of value in writing commit
graphs in the forks (since they're likely to share the majority of
their refs with their origin).

>
> If you have a better idea for how to make this work, then there
> is room for improvement.
>
> For example, if we ensure during the commit-graph write that
> all layers of the chain are within our local repo, then these
> dependency issues go away without breaking any old Git versions
> that are reading the data.

Naively, this was the way I assumed it already worked--which is why I
was writing commit graphs in forks in the first place.

>
> > Have there
> > been important changes to commit graphs since 2.29?
>
> Not in the area of commit-graph chains.

Thanks again!
-b

>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-27  2:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 22:07 Commit graph chains with no corresponding files? Jonathan Tan
2020-06-30  1:51 ` Derrick Stolee
2020-07-16 22:57   ` [FYI] commit-graph: trace expiry of commit graph links Jonathan Tan
2021-02-25  4:54   ` Commit graph chains with no corresponding files? Bryan Turner
2021-02-25 14:20     ` Derrick Stolee
2021-02-27  2:49       ` Bryan Turner

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).