From: Taylor Blau <me@ttaylorr.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
git@vger.kernel.org, jonathantanmy@google.com, gitster@pobox.com,
newren@gmail.com, Jay Conrod <jayconrod@google.com>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 2/2] shallow.c: use '{commit,rollback}_shallow_file'
Date: Tue, 2 Jun 2020 23:16:31 -0600 [thread overview]
Message-ID: <20200603051631.GA20678@syl.local> (raw)
In-Reply-To: <20200603045248.GA20266@syl.local>
On Tue, Jun 02, 2020 at 10:52:48PM -0600, Taylor Blau wrote:
> Hi Jonathan,
>
> On Tue, Jun 02, 2020 at 08:42:13PM -0700, Jonathan Nieder wrote:
> > Hi,
> >
> > Taylor Blau wrote:
> >
> > > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > > ---
> > > builtin/receive-pack.c | 4 ++--
> > > commit.h | 2 ++
> > > fetch-pack.c | 10 +++++-----
> > > shallow.c | 30 +++++++++++++++++++++---------
> > > t/t5537-fetch-shallow.sh | 29 +++++++++++++++++++++++++++++
> > > 5 files changed, 59 insertions(+), 16 deletions(-)
> >
> > I haven't investigated the cause yet, but I've run into an interesting
> > bug that bisects to this commit. Jay Conrod (cc-ed) reports:
> >
> > | I believe this is also the cause of Go toolchain test failures we've
> > | been seeing. Go uses git to fetch dependencies.
> > |
> > | The problem we're seeing can be reproduced with the script below. It
> > | should print "success". Instead, the git merge-base command fails
> > | because the commit 7303f77963648d5f1ec5e55eccfad8e14035866c
> > | (origin/master) has no history.
> >
> > -- 8< --
> > #!/bin/bash
> >
> > set -euxo pipefail
> > if [ -d legacytest ]; then
> > echo "legacytest directory already exists" >&2
> > exit 1
> > fi
> > mkdir legacytest
> > cd legacytest
> > git init --bare
> > git config protocol.version 2
> > git config fetch.writeCommitGraph true
> > git remote add origin -- https://github.com/rsc/legacytest
> > git fetch -f --depth=1 origin refs/heads/master:refs/heads/master
> > git fetch -f origin 'refs/heads/*:refs/heads/*' 'refs/tags/*:refs/tags/*'
> > git fetch --unshallow -f origin
> > git merge-base --is-ancestor -- v2.0.0 7303f77963648d5f1ec5e55eccfad8e14035866c
> > echo success
> > -- >8 --
>
> Thanks to you and Jay for the report and reproduction script. Indeed, I
> can reproduce this on the tip of master (which is equivalent to v2.27.0
> at the time of writing).
>
> > The fetch.writeCommitGraph part is interesting. When does a commit
> > graph file get written in this sequence of operations? In an
> > unshallow operation, does the usual guard against writing a commit
> > graph in a shallow repo get missed?
>
> The last 'git fetch' is the one that writes the commit-graph. You can
> verify this by sticking a 'ls objects/info' after each 'git' invocation
> in your script.
>
> Here's where things get weird, though. Prior to this patch, Git would
> pick up that the repository is shallow before unshallowing, but never
> invalidate this fact after unshallowing. That means that once we got to
> 'write_commit_graph', we'd exit immediately since it appears as if the
> repository is shallow.
>
> In this patch, we don't do that anymore, since we rightly unset the fact
> that we are (were) shallow.
>
> In a debugger, I ran your script and a 'git commit-graph write --split
> --reachable' side-by-side, and found an interesting discrepancy: some
> commits (loaded from 'copy_oids_to_commits') *don't* have their parents
> set when invoked from 'git fetch', but *do* when invoked as 'git
> commit-graph write ...'.
>
> I'm not an expert in the object cache, but my hunch is that when we
> fetch these objects they're marked as parsed without having loaded their
> parents. When we load them again via 'lookup_object', we get objects
> that look parsed, but don't have parents where they otherwise should.
Ah, this only sort of has to do with the object cache. In
'parse_commit_buffer()' we stop parsing parents in the case that the
repository is shallow (this goes back to 7f3140cd23 (git repack: keep
commits hidden by a graft, 2009-07-23)).
That makes me somewhat nervous. We're going to keep any objects opened
prior to unshallowing in the cache, along with their hidden parents. I
suspect that this is why Git has kept the shallow bit as sticky for so
long.
I'm not quite sure what to do here. I think that any of the following
would work:
* Keep the shallow bit sticky, at least for fetch.writeCommitGraph
(i.e., pretend as if fetch.writecommitgraph=0 in the case of
'--unshallow').
* Dump the object cache upon un-shallowing, forcing us to re-discover
the parents when they are no longer hidden behind a graft.
The latter seems like the most complete feasible fix. The former should
work fine to address this case, but I wonder if there are other
call-sites that are affected by this behavior. My hunch is that this is
a unique case, since it requires going from shallow to unshallow in the
same process.
I have yet to create a smaller test case, but the following should be
sufficient to dump the cache of parsed objects upon shallowing or
un-shallowing:
diff --git a/shallow.c b/shallow.c
index b826de9b67..06db857f53 100644
--- a/shallow.c
+++ b/shallow.c
@@ -90,6 +90,9 @@ static void reset_repository_shallow(struct repository *r)
{
r->parsed_objects->is_shallow = -1;
stat_validity_clear(r->parsed_objects->shallow_stat);
+
+ parsed_object_pool_clear(r->parsed_objects);
+ r->parsed_objects = parsed_object_pool_new();
}
int commit_shallow_file(struct repository *r, struct shallow_lock *lk)
Is this something we want to go forward with? Are there some
far-reaching implications that I'm missing?
> I'm going to CC Stolee to see if he has any thoughts on how to handle
> this and/or if my idea is on the right track.
>
> > "rm -fr objects/info/commit-graphs" recovers the full history in the
> > repo, so this is not a case of writing the wrong shallows --- it's
> > only a commit graph issue.
> >
> > I'll take a closer look, but thought I'd give others a chance to look
> > to in case there's something obvious. :)
> >
> > Thanks,
> > Jonathan
>
> Thanks,
> Taylor
Thanks,
Taylor
next prev parent reply other threads:[~2020-06-03 5:17 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-21 18:09 [PATCH] shallow.c: use 'reset_repository_shallow' when appropriate Taylor Blau
2020-04-21 20:41 ` Junio C Hamano
2020-04-21 20:45 ` Taylor Blau
2020-04-21 20:52 ` Junio C Hamano
2020-04-21 22:21 ` Taylor Blau
2020-04-21 23:06 ` Junio C Hamano
2020-04-22 18:05 ` Jonathan Tan
2020-04-22 18:02 ` Jonathan Tan
2020-04-22 18:15 ` Junio C Hamano
2020-04-23 0:14 ` Taylor Blau
2020-04-23 0:25 ` [PATCH v2 0/2] shallow.c: reset shallow-ness after updating Taylor Blau
2020-04-23 0:25 ` [PATCH v2 1/2] t5537: use test_write_lines, indented heredocs for readability Taylor Blau
2020-04-23 1:14 ` Jonathan Nieder
2020-04-24 17:11 ` Taylor Blau
2020-04-24 17:17 ` Jonathan Nieder
2020-04-24 20:45 ` Junio C Hamano
2020-04-23 0:25 ` [PATCH v2 2/2] shallow.c: use '{commit,rollback}_shallow_file' Taylor Blau
2020-04-23 1:23 ` Jonathan Nieder
2020-04-23 18:09 ` Jonathan Tan
2020-04-23 20:40 ` Junio C Hamano
2020-04-24 17:13 ` Taylor Blau
2020-06-03 3:42 ` Jonathan Nieder
2020-06-03 4:52 ` Taylor Blau
2020-06-03 5:16 ` Taylor Blau [this message]
2020-06-03 13:08 ` Derrick Stolee
2020-06-03 19:26 ` Taylor Blau
2020-06-03 21:23 ` Jonathan Nieder
2020-06-03 20:51 ` Jonathan Nieder
2020-06-03 22:14 ` Taylor Blau
2020-06-03 23:06 ` Jonathan Nieder
2020-06-04 17:45 ` Taylor Blau
2020-04-23 19:05 ` [PATCH] shallow.c: use 'reset_repository_shallow' when appropriate Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200603051631.GA20678@syl.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jayconrod@google.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=newren@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).