git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/2] repack: point out a bug handling stale shallow info
  2018-07-13 20:18 [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
@ 2018-07-11 22:17 ` Johannes Schindelin via GitGitGadget
  2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
  2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
  2 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-07-11 22:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

A `git fetch --prune` can turn previously-reachable objects unreachable,
even commits that are in the `shallow` list. A subsequent `git repack
-ad` will then unceremoniously drop those unreachable commits, and the
`shallow` list will become stale. This means that when we try to fetch
with a larger `--depth` the next time, we may end up with:

	fatal: error in object: unshallow <commit-hash>

Reported by Alejandro Pauly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 943231af9..561485d31 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,4 +186,31 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure '.git/shallow is edited by repack' '
+	git init shallow-server &&
+	test_commit -C shallow-server A &&
+	test_commit -C shallow-server B &&
+	git -C shallow-server checkout -b branch &&
+	test_commit -C shallow-server C &&
+	test_commit -C shallow-server E &&
+	test_commit -C shallow-server D &&
+	d="$(git -C shallow-server rev-parse --verify D)" &&
+	git -C shallow-server checkout master &&
+
+	git clone --depth=1 --no-tags --no-single-branch \
+		"file://$PWD/shallow-server" shallow-client &&
+
+	: now remove the branch and fetch with prune &&
+	git -C shallow-server branch -D branch &&
+	git -C shallow-client fetch --prune --depth=1 \
+		origin "+refs/heads/*:refs/remotes/origin/*" &&
+	git -C shallow-client repack -adfl &&
+	test_must_fail git -C shallow-client rev-parse --verify $d^0 &&
+	! grep $d shallow-client/.git/shallow &&
+
+	git -C shallow-server branch branch-orig D^0 &&
+	git -C shallow-client fetch --prune --depth=2 \
+		origin "+refs/heads/*:refs/remotes/origin/*"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-13 20:18 [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
  2018-07-11 22:17 ` [PATCH 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
@ 2018-07-11 22:23 ` Johannes Schindelin via GitGitGadget
  2018-07-13 20:31   ` Jeff King
  2018-07-17 16:39   ` Duy Nguyen
  2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
  2 siblings, 2 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-07-11 22:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

While it is true that we never add unreachable commits into pack files
intentionally (as `git repack`'s documentation states), we must not
forget that a `git fetch --prune` (or even a `git fetch` when a ref was
force-pushed in the meantime) can make a commit unreachable that was
reachable before.

Therefore it is not safe to assume that a `git repack -adlf` will keep
unreachable commits alone (under the assumption that they had not been
packed in the first place).

This is particularly important to keep in mind when looking at the
`.git/shallow` file: if any commits listed in that file become
unreachable, it is not a problem, but if they go missing, it *is* a
problem. One symptom of this problem is that a deepening fetch may now
fail with

	fatal: error in object: unshallow <commit-hash>

To avoid this problem, let's prune the shallow list in `git repack` when
the `-d` option is passed, unless `-A` is passed, too (which would force
the now-unreachable objects to be turned into loose objects instead of
being deleted).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/repack.c         | 4 ++++
 t/t5537-fetch-shallow.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159..45f321b23 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -444,6 +444,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!quiet && isatty(2))
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
+
+		if (!(pack_everything & LOOSEN_UNREACHABLE) &&
+		    is_repository_shallow())
+			prune_shallow(0);
 	}
 
 	if (!no_update_server_info)
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 561485d31..d32ba20f9 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,7 +186,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure '.git/shallow is edited by repack' '
+test_expect_success '.git/shallow is edited by repack' '
 	git init shallow-server &&
 	test_commit -C shallow-server A &&
 	test_commit -C shallow-server B &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository
@ 2018-07-13 20:18 Johannes Schindelin via GitGitGadget
  2018-07-11 22:17 ` [PATCH 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-07-13 20:18 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Under certain circumstances, commits that were reachable can be made unreachable, e.g. via `git fetch --prune`. These commits might have been packed already, in which case `git repack -adlf` will just drop them without giving them the usual grace period before an eventual `git prune` (via `git gc`) prunes them.

This is a problem when the `shallow` file still lists them, which is the reason why `git prune` edits that file. And with the proposed changes, `git repack -ad` will now do the same.

Reported by Alejandro Pauly.

Johannes Schindelin (2):
  repack: point out a bug handling stale shallow info
  repack -ad: prune the list of shallow commits

 builtin/repack.c         |  4 ++++
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)


base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-9%2Fdscho%2Fshallow-and-fetch-prune-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-9/dscho/shallow-and-fetch-prune-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/9
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
@ 2018-07-13 20:31   ` Jeff King
  2018-07-14 21:56     ` Johannes Schindelin
  2018-07-17 16:39   ` Duy Nguyen
  1 sibling, 1 reply; 47+ messages in thread
From: Jeff King @ 2018-07-13 20:31 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Junio C Hamano, Johannes Schindelin

On Thu, Jul 12, 2018 at 12:23:28AM +0200, Johannes Schindelin via GitGitGadget wrote:

> This is particularly important to keep in mind when looking at the
> `.git/shallow` file: if any commits listed in that file become
> unreachable, it is not a problem, but if they go missing, it *is* a
> problem. One symptom of this problem is that a deepening fetch may now
> fail with
> 
> 	fatal: error in object: unshallow <commit-hash>
> 
> To avoid this problem, let's prune the shallow list in `git repack` when
> the `-d` option is passed, unless `-A` is passed, too (which would force
> the now-unreachable objects to be turned into loose objects instead of
> being deleted).

I'm not sure if this covers all cases:

 - even with "-A", we may still drop objects subject to
   --unpack-unreachable. So if your pack has an old mtime (e.g., because
   you haven't packed in a while) I think you'd see the same problem.

 - if you use "-adk", we'd keep all objects, and this pruning would not
   be necessary

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 6c636e159..45f321b23 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -444,6 +444,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		if (!quiet && isatty(2))
>  			opts |= PRUNE_PACKED_VERBOSE;
>  		prune_packed_objects(opts);
> +
> +		if (!(pack_everything & LOOSEN_UNREACHABLE) &&
> +		    is_repository_shallow())
> +			prune_shallow(0);
>  	}

I understand how this solves your immediate problem, but it feels like a
weird layering violation (which I think is a result of existing layering
violations ;) ).

I.e., it seems unexpected that "git repack" is going to tweak your
shallow lists. If we were designing from scratch, the sane behavior
seems to me to be:

  1. Shallow pruning should be its own separate command (distinct from
     either repacking or loose object pruning), and should be triggered
     as part of a normal git-gc.

  AND ONE OF:

  2a. Objects mentions in the shallow file are important, and therefore
      _are_ considered reachable on their own. Neither repack nor prune
      needs to know or care.

  OR

  2b. It's OK for shallow objects to be missing, and the shallow code
      should be more resilient to missing objects. Neither repack nor
      prune needs to know or care.

I don't know how hard it would be to get there from the current code
state, though.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-13 20:31   ` Jeff King
@ 2018-07-14 21:56     ` Johannes Schindelin
  2018-07-16 17:36       ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2018-07-14 21:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Peff,

On Fri, 13 Jul 2018, Jeff King wrote:

> On Thu, Jul 12, 2018 at 12:23:28AM +0200, Johannes Schindelin via
> GitGitGadget wrote:
> 
> > This is particularly important to keep in mind when looking at the
> > `.git/shallow` file: if any commits listed in that file become
> > unreachable, it is not a problem, but if they go missing, it *is* a
> > problem. One symptom of this problem is that a deepening fetch may now
> > fail with
> > 
> > 	fatal: error in object: unshallow <commit-hash>
> > 
> > To avoid this problem, let's prune the shallow list in `git repack` when
> > the `-d` option is passed, unless `-A` is passed, too (which would force
> > the now-unreachable objects to be turned into loose objects instead of
> > being deleted).
> 
> I'm not sure if this covers all cases:
> 
>  - even with "-A", we may still drop objects subject to
>    --unpack-unreachable. So if your pack has an old mtime (e.g., because
>    you haven't packed in a while) I think you'd see the same problem.
> 
>  - if you use "-adk", we'd keep all objects, and this pruning would not
>    be necessary

Sure. I can add those cases.

> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 6c636e159..45f321b23 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -444,6 +444,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  		if (!quiet && isatty(2))
> >  			opts |= PRUNE_PACKED_VERBOSE;
> >  		prune_packed_objects(opts);
> > +
> > +		if (!(pack_everything & LOOSEN_UNREACHABLE) &&
> > +		    is_repository_shallow())
> > +			prune_shallow(0);
> >  	}
> 
> I understand how this solves your immediate problem, but it feels like a
> weird layering violation (which I think is a result of existing layering
> violations ;) ).

Okay, but please don't punish me for those existing layering violations by
forcing me to address them, instead of a quick bug fix for a very real bug
that was reported to me privately, and that I would like to see fixed
relatively quickly.

> I.e., it seems unexpected that "git repack" is going to tweak your
> shallow lists. If we were designing from scratch, the sane behavior
> seems to me to be:
> 
>   1. Shallow pruning should be its own separate command (distinct from
>      either repacking or loose object pruning), and should be triggered
>      as part of a normal git-gc.

I fail to see the value in a separate command.

And: `git gc` already calls `git prune`, which *does* prune the shallow
list.

>   AND ONE OF:
> 
>   2a. Objects mentions in the shallow file are important, and therefore
>       _are_ considered reachable on their own. Neither repack nor prune
>       needs to know or care.

If we were to do that, we would never be able to gc any stale shallow
commits.

That would be rather bad a design, don't you agree?

>   OR
> 
>   2b. It's OK for shallow objects to be missing, and the shallow code
>       should be more resilient to missing objects. Neither repack nor
>       prune needs to know or care.

That would be possible. Kind of like saying: we do have this list of
shallow commits, but oh, BTW, it is likely incorrect, so we painstakingly
verify for every entry during every fetch and push that those commits
objects are still there.

It looks to me like a rather bad design, too, as the whole idea of having
a `git gc` is to update such information *then*.

Sadly, we also allow `git repack` to drop objects, and it is really the
responsibility of a command that drops objects to update things like the
`shallow` list. Because that is exactly the time when its contents can go
stale.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-14 21:56     ` Johannes Schindelin
@ 2018-07-16 17:36       ` Jeff King
  2018-07-17 16:25         ` Junio C Hamano
  2018-07-17 17:28         ` Duy Nguyen
  0 siblings, 2 replies; 47+ messages in thread
From: Jeff King @ 2018-07-16 17:36 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

On Sat, Jul 14, 2018 at 11:56:57PM +0200, Johannes Schindelin wrote:

> > > To avoid this problem, let's prune the shallow list in `git repack` when
> > > the `-d` option is passed, unless `-A` is passed, too (which would force
> > > the now-unreachable objects to be turned into loose objects instead of
> > > being deleted).
> > 
> > I'm not sure if this covers all cases:
> > 
> >  - even with "-A", we may still drop objects subject to
> >    --unpack-unreachable. So if your pack has an old mtime (e.g., because
> >    you haven't packed in a while) I think you'd see the same problem.
> > 
> >  - if you use "-adk", we'd keep all objects, and this pruning would not
> >    be necessary
> 
> Sure. I can add those cases.

Thanks.

> > I understand how this solves your immediate problem, but it feels like a
> > weird layering violation (which I think is a result of existing layering
> > violations ;) ).
> 
> Okay, but please don't punish me for those existing layering violations by
> forcing me to address them, instead of a quick bug fix for a very real bug
> that was reported to me privately, and that I would like to see fixed
> relatively quickly.

I'm OK with having a partial fix, or one that fixes immediate pain
without doing a big cleanup, as long as it doesn't make anything _worse_
in a user-visible way. And that's really my question: is pruning here
going to bite people unexpectedly (not rhetorical -- I really don't
know)?

For instance, at GitHub we do not ever run "git gc", but have our
maintenance regimen that builds around "git repack". I don't think this
patch will matter to us either way, because we would never have a
shallow repository in the first place. But I'm wondering if people in a
similar situation might.

I'm also not entirely sure if people _care_ if their shallow list is
pruned. Maybe it's not a big deal, and should just be quietly cleaned
up.

And I know that you're probably coming at it from the opposite angle.
Somebody isn't running git-gc but doing a "repack -adl" and they _do_
want these shallow files cleaned up (because their repo is broken after
dropping the objects). I just want to make sure we don't regress some
other case.

> > I.e., it seems unexpected that "git repack" is going to tweak your
> > shallow lists. If we were designing from scratch, the sane behavior
> > seems to me to be:
> > 
> >   1. Shallow pruning should be its own separate command (distinct from
> >      either repacking or loose object pruning), and should be triggered
> >      as part of a normal git-gc.
> 
> I fail to see the value in a separate command.

The value of a separate command is that you can run those other commands
without triggering this behavior. I actually think "git prune" does too
much already (e.g., imagine that I do not ever want to prune objects,
but I _do_ want to get rid of tmp_pack and tmp_obj cruft; what command
do I run?). But that is definitely not a new problem. And I'm OK with
not fixing it for now. My main concern is that we are using the presence
of that mistake to justify making it again.

> And: `git gc` already calls `git prune`, which *does* prune the shallow
> list.

Right, I was trying to propose that each individual cleanup step which
_could_ be done independently should be done so, but that "gc" should
continue to do them all.

I think in the case of git-prune and prune_shallow(), it's a little
tricky, though. It looks like prune_shallow() relies on somebody having
marked the SEEN flag, which implies doing a full reachability walk. So
it's really amortizing the work already being done by prune.

Speaking of which, I don't see how your patch can work as-is. The repack
command does not do such a walk, so it has no idea which commits are
SEEN or not, and will delete all of them! Try this:

  [shallow of clone of git.git (or any repo)]
  $ git clone --no-local --depth=1 /path/to/git tmp
  ...
  $ cd tmp

  [we have a commit]
  $ git log --oneline -1
  de46fca (grafted, HEAD) repack -ad: prune the list of shallow commits

  [repacking with existing git is fine]
  $ git repack -adl
  ...
  $ git log --oneline -1
  de46fca (grafted, HEAD) repack -ad: prune the list of shallow commits

  [repacking with your patch empties the shallow file, which
   makes the repository look corrupt]
  $ /path/to/git repack -adl
  $ git log --oneline -1
  error: Could not read a2df50675979af639cf9490f7fc9b86fa18fd907
  fatal: Failed to traverse parents of commit de46fca5b43f47f3c5c6f9232a17490d39fc80b1

So either we have to do a reachability walk (which is expensive), or we
have to rely on some other command (like prune) that is doing a
reachability walk and reuse that work.

That also probably means my "separate command" suggestion is a bad one.
If we're trying to amortize work, then we're better to have a single
command with options to enable/disable certain operations (so in theory
I could use "git prune --no-prune-objects" to just clean up tempfiles,
and users of "repack -adl" could do something similar to _just_ prune
the shallow if they really didn't want to delete loose objects.

> >   AND ONE OF:
> > 
> >   2a. Objects mentions in the shallow file are important, and therefore
> >       _are_ considered reachable on their own. Neither repack nor prune
> >       needs to know or care.
> 
> If we were to do that, we would never be able to gc any stale shallow
> commits.
> 
> That would be rather bad a design, don't you agree?

I had meant for this to be coupled with my (1), which would presumably
remove unreachable commits from the shallow list in a separate step
(based on what criteria, I don't know -- again, I'm not really a shallow
user).

> >   2b. It's OK for shallow objects to be missing, and the shallow code
> >       should be more resilient to missing objects. Neither repack nor
> >       prune needs to know or care.
> 
> That would be possible. Kind of like saying: we do have this list of
> shallow commits, but oh, BTW, it is likely incorrect, so we painstakingly
> verify for every entry during every fetch and push that those commits
> objects are still there.

Obviously verifying reachability on each fetch is a bad idea. But my
understanding of the shallow list is that it says "this is a graft point
where we do not have any of the parents". If we find that we do not have
such an object, would it be OK to quietly ignore that? We used to claim
"we do not have the parents of X", but now we know "well, we do not have
X anymore either".

Again, I may be showing my ignorance of the shallow code here.

> It looks to me like a rather bad design, too, as the whole idea of having
> a `git gc` is to update such information *then*.

Right, I think "git gc" is absolutely the place to do such cleanups. My
only complaint was that having it as part of repack may be unexpected.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository
  2018-07-13 20:18 [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
  2018-07-11 22:17 ` [PATCH 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
  2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
@ 2018-07-17 13:51 ` Johannes Schindelin via GitGitGadget
  2018-07-17 13:51   ` [PATCH v2 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
                     ` (3 more replies)
  2 siblings, 4 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-07-17 13:51 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Under certain circumstances, commits that were reachable can be made unreachable, e.g. via `git fetch --prune`. These commits might have been packed already, in which case `git repack -adlf` will just drop them without giving them the usual grace period before an eventual `git prune` (via `git gc`) prunes them.

This is a problem when the `shallow` file still lists them, which is the reason why `git prune` edits that file. And with the proposed changes, `git repack -ad` will now do the same.

Reported by Alejandro Pauly.

Changes since v1:

- Also trigger `prune_shallow()` when `--unpack-unreachable=<approxidate>` was passed to `git repack`.
- No need to trigger `prune_shallow()` when `git repack` was called with `-k`.

Johannes Schindelin (2):
  repack: point out a bug handling stale shallow info
  repack -ad: prune the list of shallow commits

 builtin/repack.c         |  6 ++++++
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)


base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-9%2Fdscho%2Fshallow-and-fetch-prune-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-9/dscho/shallow-and-fetch-prune-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/9

Range-diff vs v1:

 1:  d2be40131 = 1:  d2be40131 repack: point out a bug handling stale shallow info
 2:  b4e01a963 ! 2:  c7ee6e008 repack -ad: prune the list of shallow commits
     @@ -23,7 +23,8 @@
          To avoid this problem, let's prune the shallow list in `git repack` when
          the `-d` option is passed, unless `-A` is passed, too (which would force
          the now-unreachable objects to be turned into loose objects instead of
     -    being deleted).
     +    being deleted). Additionally, e also need to take `--keep-reachable` and
     +    `--unpack-unreachable=<date>` into account.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
     @@ -35,7 +36,9 @@
       			opts |= PRUNE_PACKED_VERBOSE;
       		prune_packed_objects(opts);
      +
     -+		if (!(pack_everything & LOOSEN_UNREACHABLE) &&
     ++		if (!keep_unreachable &&
     ++		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
     ++		     unpack_unreachable) &&
      +		    is_repository_shallow())
      +			prune_shallow(0);
       	}

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 1/2] repack: point out a bug handling stale shallow info
  2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
@ 2018-07-17 13:51   ` Johannes Schindelin via GitGitGadget
  2018-07-17 13:51   ` [PATCH v2 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-07-17 13:51 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

A `git fetch --prune` can turn previously-reachable objects unreachable,
even commits that are in the `shallow` list. A subsequent `git repack
-ad` will then unceremoniously drop those unreachable commits, and the
`shallow` list will become stale. This means that when we try to fetch
with a larger `--depth` the next time, we may end up with:

	fatal: error in object: unshallow <commit-hash>

Reported by Alejandro Pauly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 943231af9..561485d31 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,4 +186,31 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure '.git/shallow is edited by repack' '
+	git init shallow-server &&
+	test_commit -C shallow-server A &&
+	test_commit -C shallow-server B &&
+	git -C shallow-server checkout -b branch &&
+	test_commit -C shallow-server C &&
+	test_commit -C shallow-server E &&
+	test_commit -C shallow-server D &&
+	d="$(git -C shallow-server rev-parse --verify D)" &&
+	git -C shallow-server checkout master &&
+
+	git clone --depth=1 --no-tags --no-single-branch \
+		"file://$PWD/shallow-server" shallow-client &&
+
+	: now remove the branch and fetch with prune &&
+	git -C shallow-server branch -D branch &&
+	git -C shallow-client fetch --prune --depth=1 \
+		origin "+refs/heads/*:refs/remotes/origin/*" &&
+	git -C shallow-client repack -adfl &&
+	test_must_fail git -C shallow-client rev-parse --verify $d^0 &&
+	! grep $d shallow-client/.git/shallow &&
+
+	git -C shallow-server branch branch-orig D^0 &&
+	git -C shallow-client fetch --prune --depth=2 \
+		origin "+refs/heads/*:refs/remotes/origin/*"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
  2018-07-17 13:51   ` [PATCH v2 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
@ 2018-07-17 13:51   ` Johannes Schindelin via GitGitGadget
  2018-07-17 17:45     ` Eric Sunshine
  2018-07-17 19:15   ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Jeff King
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
  3 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-07-17 13:51 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

While it is true that we never add unreachable commits into pack files
intentionally (as `git repack`'s documentation states), we must not
forget that a `git fetch --prune` (or even a `git fetch` when a ref was
force-pushed in the meantime) can make a commit unreachable that was
reachable before.

Therefore it is not safe to assume that a `git repack -adlf` will keep
unreachable commits alone (under the assumption that they had not been
packed in the first place).

This is particularly important to keep in mind when looking at the
`.git/shallow` file: if any commits listed in that file become
unreachable, it is not a problem, but if they go missing, it *is* a
problem. One symptom of this problem is that a deepening fetch may now
fail with

	fatal: error in object: unshallow <commit-hash>

To avoid this problem, let's prune the shallow list in `git repack` when
the `-d` option is passed, unless `-A` is passed, too (which would force
the now-unreachable objects to be turned into loose objects instead of
being deleted). Additionally, e also need to take `--keep-reachable` and
`--unpack-unreachable=<date>` into account.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/repack.c         | 6 ++++++
 t/t5537-fetch-shallow.sh | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159..4caf57221 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -444,6 +444,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!quiet && isatty(2))
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
+
+		if (!keep_unreachable &&
+		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
+		     unpack_unreachable) &&
+		    is_repository_shallow())
+			prune_shallow(0);
 	}
 
 	if (!no_update_server_info)
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 561485d31..d32ba20f9 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,7 +186,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure '.git/shallow is edited by repack' '
+test_expect_success '.git/shallow is edited by repack' '
 	git init shallow-server &&
 	test_commit -C shallow-server A &&
 	test_commit -C shallow-server B &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-16 17:36       ` Jeff King
@ 2018-07-17 16:25         ` Junio C Hamano
  2018-07-19 16:42           ` Johannes Schindelin
  2018-07-17 17:28         ` Duy Nguyen
  1 sibling, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2018-07-17 16:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git

Jeff King <peff@peff.net> writes:

> I'm OK with having a partial fix, or one that fixes immediate pain
> without doing a big cleanup, as long as it doesn't make anything _worse_
> in a user-visible way. And that's really my question: is pruning here
> going to bite people unexpectedly (not rhetorical -- I really don't
> know)?

Yeah, that matches the general guideline I follow when reviewing a
patch that claims to make existing things better.  And I do not
think I can explain to a third person why pruning here is a good
idea and won't cause problems, after seeing these patches and
the discussion from the sideline.

> Right, I think "git gc" is absolutely the place to do such cleanups. My
> only complaint was that having it as part of repack may be unexpected.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
  2018-07-13 20:31   ` Jeff King
@ 2018-07-17 16:39   ` Duy Nguyen
  2018-07-17 16:48     ` Duy Nguyen
  1 sibling, 1 reply; 47+ messages in thread
From: Duy Nguyen @ 2018-07-17 16:39 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin

On Fri, Jul 13, 2018 at 10:19 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> While it is true that we never add unreachable commits into pack files
> intentionally (as `git repack`'s documentation states), we must not
> forget that a `git fetch --prune` (or even a `git fetch` when a ref was
> force-pushed in the meantime) can make a commit unreachable that was
> reachable before.
>
> Therefore it is not safe to assume that a `git repack -adlf` will keep
> unreachable commits alone (under the assumption that they had not been
> packed in the first place).
>
> This is particularly important to keep in mind when looking at the
> `.git/shallow` file: if any commits listed in that file become
> unreachable, it is not a problem, but if they go missing, it *is* a
> problem. One symptom of this problem is that a deepening fetch may now
> fail with
>
>         fatal: error in object: unshallow <commit-hash>
>

Could you elaborate a bit more? I don't quite see the connection
between the reachability in the first two paragraphs and .git/shallow
in the third one. I'm guessing we drop objects in between because
"they go missing"? Where? How does prune_shallow() in prune.c play any
role in this (if it does)?
-- 
Duy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 16:39   ` Duy Nguyen
@ 2018-07-17 16:48     ` Duy Nguyen
  2018-07-19 17:50       ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Duy Nguyen @ 2018-07-17 16:48 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin

On Tue, Jul 17, 2018 at 6:39 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Fri, Jul 13, 2018 at 10:19 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > While it is true that we never add unreachable commits into pack files
> > intentionally (as `git repack`'s documentation states), we must not
> > forget that a `git fetch --prune` (or even a `git fetch` when a ref was
> > force-pushed in the meantime) can make a commit unreachable that was
> > reachable before.
> >
> > Therefore it is not safe to assume that a `git repack -adlf` will keep
> > unreachable commits alone (under the assumption that they had not been
> > packed in the first place).
> >
> > This is particularly important to keep in mind when looking at the
> > `.git/shallow` file: if any commits listed in that file become
> > unreachable, it is not a problem, but if they go missing, it *is* a
> > problem. One symptom of this problem is that a deepening fetch may now
> > fail with
> >
> >         fatal: error in object: unshallow <commit-hash>
> >
>
> Could you elaborate a bit more?

Never mind. The problem is described in another patch.. sigh.. I
understand you want to flip that "failure" to "success" but personally
I don't think it worth it to look back in history and have "what?"
moments like when I read this patch alone.
-- 
Duy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-16 17:36       ` Jeff King
  2018-07-17 16:25         ` Junio C Hamano
@ 2018-07-17 17:28         ` Duy Nguyen
  2018-07-17 19:41           ` Jeff King
  1 sibling, 1 reply; 47+ messages in thread
From: Duy Nguyen @ 2018-07-17 17:28 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, gitgitgadget, Git Mailing List,
	Junio C Hamano

On Mon, Jul 16, 2018 at 7:38 PM Jeff King <peff@peff.net> wrote:
> in a user-visible way. And that's really my question: is pruning here
> going to bite people unexpectedly (not rhetorical -- I really don't
> know)?

If shallow points are no longer reachable, removing them should not
bite anybody, I think.

> For instance, at GitHub we do not ever run "git gc", but have our
> maintenance regimen that builds around "git repack". I don't think this
> patch will matter to us either way, because we would never have a
> shallow repository in the first place. But I'm wondering if people in a
> similar situation might.
>
> I'm also not entirely sure if people _care_ if their shallow list is
> pruned. Maybe it's not a big deal, and should just be quietly cleaned
> up.
>
> And I know that you're probably coming at it from the opposite angle.
> Somebody isn't running git-gc but doing a "repack -adl" and they _do_
> want these shallow files cleaned up (because their repo is broken after
> dropping the objects). I just want to make sure we don't regress some
> other case.
>
> > > I.e., it seems unexpected that "git repack" is going to tweak your
> > > shallow lists. If we were designing from scratch, the sane behavior
> > > seems to me to be:
> > >
> > >   1. Shallow pruning should be its own separate command (distinct from
> > >      either repacking or loose object pruning), and should be triggered
> > >      as part of a normal git-gc.
> >
> > I fail to see the value in a separate command.
>
> The value of a separate command is that you can run those other commands
> without triggering this behavior. I actually think "git prune" does too
> much already (e.g., imagine that I do not ever want to prune objects,
> but I _do_ want to get rid of tmp_pack and tmp_obj cruft; what command
> do I run?). But that is definitely not a new problem. And I'm OK with
> not fixing it for now. My main concern is that we are using the presence
> of that mistake to justify making it again.
>
> > And: `git gc` already calls `git prune`, which *does* prune the shallow
> > list.
>
> Right, I was trying to propose that each individual cleanup step which
> _could_ be done independently should be done so, but that "gc" should
> continue to do them all.
>
> I think in the case of git-prune and prune_shallow(), it's a little
> tricky, though. It looks like prune_shallow() relies on somebody having
> marked the SEEN flag, which implies doing a full reachability walk.

Sorry, my bad for hiding this SEEN flag deep in library code like this
in eab3296c7e (prune: clean .git/shallow after pruning objects -
2013-12-05) But in my defense I didn't realize the horror of sharing
object flags a year later and added the "object flag allocation" in
object.h

> So it's really amortizing the work already being done by prune.
>
> Speaking of which, I don't see how your patch can work as-is. The repack
> command does not do such a walk, so it has no idea which commits are
> SEEN or not, and will delete all of them! Try this:
>
>   [shallow of clone of git.git (or any repo)]
>   $ git clone --no-local --depth=1 /path/to/git tmp
>   ...
>   $ cd tmp
>
>   [we have a commit]
>   $ git log --oneline -1
>   de46fca (grafted, HEAD) repack -ad: prune the list of shallow commits
>
>   [repacking with existing git is fine]
>   $ git repack -adl
>   ...
>   $ git log --oneline -1
>   de46fca (grafted, HEAD) repack -ad: prune the list of shallow commits
>
>   [repacking with your patch empties the shallow file, which
>    makes the repository look corrupt]
>   $ /path/to/git repack -adl
>   $ git log --oneline -1
>   error: Could not read a2df50675979af639cf9490f7fc9b86fa18fd907
>   fatal: Failed to traverse parents of commit de46fca5b43f47f3c5c6f9232a17490d39fc80b1
>
> So either we have to do a reachability walk (which is expensive), or we
> have to rely on some other command (like prune) that is doing a
> reachability walk and reuse that work.

Since "git prune" took care of loose objects, if we're going to delete
some redundant packs, we can perform a reasonably cheap "reachability"
test in repack, I think. We have the list of new packs from
pack-objects which should contain all reachable objects (and even some
unreachable ones). If we traverse all of their idx files and mark as
SEEN, then whatever shallow points that are not marked SEEN _and_ not
loose objects could be safely removed.

> > >   2b. It's OK for shallow objects to be missing, and the shallow code
> > >       should be more resilient to missing objects. Neither repack nor
> > >       prune needs to know or care.
> >
> > That would be possible. Kind of like saying: we do have this list of
> > shallow commits, but oh, BTW, it is likely incorrect, so we painstakingly
> > verify for every entry during every fetch and push that those commits
> > objects are still there.
>
> Obviously verifying reachability on each fetch is a bad idea. But my
> understanding of the shallow list is that it says "this is a graft point
> where we do not have any of the parents". If we find that we do not have
> such an object, would it be OK to quietly ignore that? We used to claim
> "we do not have the parents of X", but now we know "well, we do not have
> X anymore either".
>
> Again, I may be showing my ignorance of the shallow code here.

I don't see any problem with this either, but then I've not worked on
shallow code for quite some time. The only potential problem is where
we do this check. If we check (and drop) shallow points when we read
.git/info/shallow, then when we write it down some time in the future
we accidentally prune them. Which might be ok but I feel safer when we
only prune at one place (prune/gc/repack) where we can be more careful
and can do more testing.

If we read the shallow list as-is, then just not send "shallow" lines
in fetch-pack for shallow points that do not exist, I don't see a
problem, but it may be a bit more work and we could get to some
confusing state where upload-pack gives us the same shallow point that
we have but ignore because we don't have objects behind it. Hm...
actually I might see a problem :)
-- 
Duy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 13:51   ` [PATCH v2 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
@ 2018-07-17 17:45     ` Eric Sunshine
  0 siblings, 0 replies; 47+ messages in thread
From: Eric Sunshine @ 2018-07-17 17:45 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Tue, Jul 17, 2018 at 9:51 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> While it is true that we never add unreachable commits into pack files
> intentionally (as `git repack`'s documentation states), we must not
> forget that a `git fetch --prune` (or even a `git fetch` when a ref was
> force-pushed in the meantime) can make a commit unreachable that was
> reachable before.
>
> Therefore it is not safe to assume that a `git repack -adlf` will keep
> unreachable commits alone (under the assumption that they had not been
> packed in the first place).
>
> This is particularly important to keep in mind when looking at the
> `.git/shallow` file: if any commits listed in that file become
> unreachable, it is not a problem, but if they go missing, it *is* a
> problem. One symptom of this problem is that a deepening fetch may now
> fail with
>
>         fatal: error in object: unshallow <commit-hash>
>
> To avoid this problem, let's prune the shallow list in `git repack` when
> the `-d` option is passed, unless `-A` is passed, too (which would force
> the now-unreachable objects to be turned into loose objects instead of
> being deleted). Additionally, e also need to take `--keep-reachable` and

s/, e/, we/

> `--unpack-unreachable=<date>` into account.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository
  2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
  2018-07-17 13:51   ` [PATCH v2 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
  2018-07-17 13:51   ` [PATCH v2 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
@ 2018-07-17 19:15   ` Jeff King
  2018-07-17 19:20     ` Jeff King
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
  3 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2018-07-17 19:15 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Junio C Hamano

On Tue, Jul 17, 2018 at 06:51:39AM -0700, Johannes Schindelin via GitGitGadget wrote:

> Under certain circumstances, commits that were reachable can be made
> unreachable, e.g. via `git fetch --prune`. These commits might have
> been packed already, in which case `git repack -adlf` will just drop
> them without giving them the usual grace period before an eventual
> `git prune` (via `git gc`) prunes them.
> 
> This is a problem when the `shallow` file still lists them, which is
> the reason why `git prune` edits that file. And with the proposed
> changes, `git repack -ad` will now do the same.
> 
> Reported by Alejandro Pauly.
> 
> Changes since v1:
> 
> - Also trigger `prune_shallow()` when `--unpack-unreachable=<approxidate>` was passed to `git repack`.
> - No need to trigger `prune_shallow()` when `git repack` was called with `-k`.

I think you might have missed the bigger problem I pointed at, as it was
buried deep within my later reply. Try applying this patch on top, which
tests the opposite case (reachable shallow commits are retained):

diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index d32ba20f9d..911e457ae1 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,17 +186,20 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_success '.git/shallow is edited by repack' '
+test_expect_success 'set up shallow server' '
 	git init shallow-server &&
 	test_commit -C shallow-server A &&
 	test_commit -C shallow-server B &&
 	git -C shallow-server checkout -b branch &&
 	test_commit -C shallow-server C &&
 	test_commit -C shallow-server E &&
 	test_commit -C shallow-server D &&
 	d="$(git -C shallow-server rev-parse --verify D)" &&
-	git -C shallow-server checkout master &&
+	git -C shallow-server checkout master
+'
 
+test_expect_success 'repack drops unreachable objects from .git/shallow' '
+	test_when_finished "rm -rf shallow-client" &&
 	git clone --depth=1 --no-tags --no-single-branch \
 		"file://$PWD/shallow-server" shallow-client &&
 
@@ -213,4 +216,13 @@ test_expect_success '.git/shallow is edited by repack' '
 		origin "+refs/heads/*:refs/remotes/origin/*"
 '
 
+test_expect_success 'repack keeps reachable objects in .git/shallow' '
+	test_when_finished "rm -rf shallow-client" &&
+	git clone --depth=1 --no-tags --no-single-branch \
+		"file://$PWD/shallow-server" shallow-client &&
+
+	git -C shallow-client repack -adfl &&
+	grep $d shallow-client/.git/shallow
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository
  2018-07-17 19:15   ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Jeff King
@ 2018-07-17 19:20     ` Jeff King
  2018-07-19 17:48       ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2018-07-17 19:20 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano

On Tue, Jul 17, 2018 at 03:15:31PM -0400, Jeff King wrote:

> > - Also trigger `prune_shallow()` when `--unpack-unreachable=<approxidate>` was passed to `git repack`.
> > - No need to trigger `prune_shallow()` when `git repack` was called with `-k`.
> 
> I think you might have missed the bigger problem I pointed at, as it was
> buried deep within my later reply. Try applying this patch on top, which
> tests the opposite case (reachable shallow commits are retained):

By the way, I notice that the patches themselves are cc'd to you, but
the cover letter isn't. So my reply went only to gitgitgadget, which
(AFAIK) has not yet learned to work as a mail-to-comment gateway.

So I'm copying this to you directly to make sure you see it, but also
because I'm not sure if the gitgitgadget cc behavior is intentional or
not.

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 17:28         ` Duy Nguyen
@ 2018-07-17 19:41           ` Jeff King
  2018-07-18 17:31             ` Duy Nguyen
  0 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2018-07-17 19:41 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Johannes Schindelin, gitgitgadget, Git Mailing List,
	Junio C Hamano

On Tue, Jul 17, 2018 at 07:28:07PM +0200, Duy Nguyen wrote:

> On Mon, Jul 16, 2018 at 7:38 PM Jeff King <peff@peff.net> wrote:
> > in a user-visible way. And that's really my question: is pruning here
> > going to bite people unexpectedly (not rhetorical -- I really don't
> > know)?
> 
> If shallow points are no longer reachable, removing them should not
> bite anybody, I think.

I slept on this to see if I could brainstorm any other ways.

The only thing I really came up with is that removing shallows is racy
with respect to the reachability check.  For instance, if "prune" runs
at the same as an incoming shallow fetch, we'll compute the reachability
without holding the shallow lock. Somebody else may write an entry for
an object we've never heard of (because it just showed up), and we'd
erase it, making the repository appear corrupt.

But note that I used "prune" in the above example, because this bug
already exists. So probably we're not changing anything material here.
Though if we wanted to fix it, I think it would require holding the
shallow lock during the reachability analysis, which is probably not
something that repack would want to do. Unlike prune, it then does
a lot of other expensive operations, like delta compression and writing
out the pack, before it would get to the shallow prune.

Although even holding the lock for the duration of "prune" is more than
we need. Shallow commits must be commits, so we don't need to do a full
graph walk (and in my experience there's often an order-of-magnitude
difference between the two; even more so if we're using Stolee's
commit-graph cache).

> > I think in the case of git-prune and prune_shallow(), it's a little
> > tricky, though. It looks like prune_shallow() relies on somebody having
> > marked the SEEN flag, which implies doing a full reachability walk.
> 
> Sorry, my bad for hiding this SEEN flag deep in library code like this
> in eab3296c7e (prune: clean .git/shallow after pruning objects -
> 2013-12-05) But in my defense I didn't realize the horror of sharing
> object flags a year later and added the "object flag allocation" in
> object.h

Sort of an aside to the patch under discussion, but I think it may make
sense for prune_shallow() to take a callback function for determining
whether a commit is reachable.

I have an old patch that teaches git-prune to lazily do the reachability
check, since in many cases "git repack" will have just packed all of the
loose objects. But it just occurred to me that this patch is totally
broken with respect to prune_shallow(), because it would not set the
SEEN flag (I've literally been running with it for years, which goes to
show how often I use the shallow feature).

And if we were to have repack do a prune_shallow(), it may want to use a
different method than traversing and marking each object SEEN.

> > So either we have to do a reachability walk (which is expensive), or we
> > have to rely on some other command (like prune) that is doing a
> > reachability walk and reuse that work.
> 
> Since "git prune" took care of loose objects, if we're going to delete
> some redundant packs, we can perform a reasonably cheap "reachability"
> test in repack, I think. We have the list of new packs from
> pack-objects which should contain all reachable objects (and even some
> unreachable ones). If we traverse all of their idx files and mark as
> SEEN, then whatever shallow points that are not marked SEEN _and_ not
> loose objects could be safely removed.

Hmm. I don't immediately see any reason that would not work with the
current code. But I am a little uncomfortable adding these sorts of
subtle dependencies and assumptions. It feels like we're building a
house of cards.

> I don't see any problem with this either, but then I've not worked on
> shallow code for quite some time. The only potential problem is where
> we do this check. If we check (and drop) shallow points when we read
> .git/info/shallow, then when we write it down some time in the future
> we accidentally prune them. Which might be ok but I feel safer when we
> only prune at one place (prune/gc/repack) where we can be more careful
> and can do more testing.
> 
> If we read the shallow list as-is, then just not send "shallow" lines
> in fetch-pack for shallow points that do not exist, I don't see a
> problem, but it may be a bit more work and we could get to some
> confusing state where upload-pack gives us the same shallow point that
> we have but ignore because we don't have objects behind it. Hm...
> actually I might see a problem :)

Yeah, I agree if we were to do this it would definitely be in a
"read-only" way: let the current command skip those grafts for its
operation, but do not impact the on-disk shallow file. Then races or
other assumptions can at worst impact that operation, and not cause a
lasting corruption (and in particular I think we'd be subject to the
race I described at the start of this mail).

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 19:41           ` Jeff King
@ 2018-07-18 17:31             ` Duy Nguyen
  2018-07-18 17:45               ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Duy Nguyen @ 2018-07-18 17:31 UTC (permalink / raw)
  To: Jeff King, Johannes Schindelin
  Cc: gitgitgadget, Git Mailing List, Junio C Hamano

On Tue, Jul 17, 2018 at 9:41 PM Jeff King <peff@peff.net> wrote:
> I slept on this to see if I could brainstorm any other ways.
>
> ..
>
> Sort of an aside to the patch under discussion, but I think it may make
> sense for prune_shallow() to take a callback function for determining
> whether a commit is reachable.
>
> I have an old patch that teaches git-prune to lazily do the reachability
> check, since in many cases "git repack" will have just packed all of the
> loose objects. But it just occurred to me that this patch is totally
> broken with respect to prune_shallow(), because it would not set the
> SEEN flag (I've literally been running with it for years, which goes to
> show how often I use the shallow feature).
>
> And if we were to have repack do a prune_shallow(), it may want to use a
> different method than traversing and marking each object SEEN.

All of this sounds good. The only thing I'd like to do a bit
differently is to pass the reachable commits in prune_shallow() as a
commit-slab instead of taking a callback function. I'll refactor this
code, move prune_shallow() to a separate command prune-shallow and do
the locking thing.

Johannes I could take over this bug too (since I introduced the
shallow pruning and missed this part) if you are busy, but if you want
it to be done fast, I can't help. I still have a list of things to go
through.
-- 
Duy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-18 17:31             ` Duy Nguyen
@ 2018-07-18 17:45               ` Jeff King
  2018-07-18 17:48                 ` Duy Nguyen
  0 siblings, 1 reply; 47+ messages in thread
From: Jeff King @ 2018-07-18 17:45 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Johannes Schindelin, gitgitgadget, Git Mailing List,
	Junio C Hamano

On Wed, Jul 18, 2018 at 07:31:40PM +0200, Duy Nguyen wrote:

> > Sort of an aside to the patch under discussion, but I think it may make
> > sense for prune_shallow() to take a callback function for determining
> > whether a commit is reachable.
> >
> > I have an old patch that teaches git-prune to lazily do the reachability
> > check, since in many cases "git repack" will have just packed all of the
> > loose objects. But it just occurred to me that this patch is totally
> > broken with respect to prune_shallow(), because it would not set the
> > SEEN flag (I've literally been running with it for years, which goes to
> > show how often I use the shallow feature).
> >
> > And if we were to have repack do a prune_shallow(), it may want to use a
> > different method than traversing and marking each object SEEN.
> 
> All of this sounds good. The only thing I'd like to do a bit
> differently is to pass the reachable commits in prune_shallow() as a
> commit-slab instead of taking a callback function. I'll refactor this
> code, move prune_shallow() to a separate command prune-shallow and do
> the locking thing.

I think using a slab is much nicer than the current global-struct flags.
But it's not as flexible as a callback, for two reasons:

  - in the lazy case, the caller might not even have loaded the slab
    yet. On the other hand, it might be sufficient to just be broad
    there, and just always pre-populate the slab when
    is_repository_shallow(), before we even call into prune_shallow().
    If we have _any_ entries in the shallow file, we'd need to compute
    reachability.

  - it precludes any optimizations that compute partial reachability.
    Asking "is XYZ reachable" is a much easier question to answer than
    "show me the full reachability graph." At the least, it lets you
    stop the graph traversal early. And with generation numbers, it can
    even avoid traversing down unproductive segments of the graph.

I think it's OK to switch to a slab for now if you don't want to do the
callback thing. But I think a callback is probably long-term the right
thing (and I actually think it may be _less_ work to implement, since
then prune would probably be OK sticking with the global struct flags).

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-18 17:45               ` Jeff King
@ 2018-07-18 17:48                 ` Duy Nguyen
  0 siblings, 0 replies; 47+ messages in thread
From: Duy Nguyen @ 2018-07-18 17:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, gitgitgadget, Git Mailing List,
	Junio C Hamano

On Wed, Jul 18, 2018 at 7:45 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 18, 2018 at 07:31:40PM +0200, Duy Nguyen wrote:
>
> > > Sort of an aside to the patch under discussion, but I think it may make
> > > sense for prune_shallow() to take a callback function for determining
> > > whether a commit is reachable.
> > >
> > > I have an old patch that teaches git-prune to lazily do the reachability
> > > check, since in many cases "git repack" will have just packed all of the
> > > loose objects. But it just occurred to me that this patch is totally
> > > broken with respect to prune_shallow(), because it would not set the
> > > SEEN flag (I've literally been running with it for years, which goes to
> > > show how often I use the shallow feature).
> > >
> > > And if we were to have repack do a prune_shallow(), it may want to use a
> > > different method than traversing and marking each object SEEN.
> >
> > All of this sounds good. The only thing I'd like to do a bit
> > differently is to pass the reachable commits in prune_shallow() as a
> > commit-slab instead of taking a callback function. I'll refactor this
> > code, move prune_shallow() to a separate command prune-shallow and do
> > the locking thing.
>
> I think using a slab is much nicer than the current global-struct flags.
> But it's not as flexible as a callback, for two reasons:
>
>   - in the lazy case, the caller might not even have loaded the slab
>     yet. On the other hand, it might be sufficient to just be broad
>     there, and just always pre-populate the slab when
>     is_repository_shallow(), before we even call into prune_shallow().
>     If we have _any_ entries in the shallow file, we'd need to compute
>     reachability.
>
>   - it precludes any optimizations that compute partial reachability.
>     Asking "is XYZ reachable" is a much easier question to answer than
>     "show me the full reachability graph." At the least, it lets you
>     stop the graph traversal early. And with generation numbers, it can
>     even avoid traversing down unproductive segments of the graph.

Both good points. Callback it is then.
-- 
Duy

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 16:25         ` Junio C Hamano
@ 2018-07-19 16:42           ` Johannes Schindelin
  2018-07-19 20:49             ` Junio C Hamano
  0 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2018-07-19 16:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Schindelin via GitGitGadget, git

Hi Junio,

On Tue, 17 Jul 2018, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm OK with having a partial fix, or one that fixes immediate pain
> > without doing a big cleanup, as long as it doesn't make anything _worse_
> > in a user-visible way. And that's really my question: is pruning here
> > going to bite people unexpectedly (not rhetorical -- I really don't
> > know)?
> 
> Yeah, that matches the general guideline I follow when reviewing a
> patch that claims to make existing things better.  And I do not
> think I can explain to a third person why pruning here is a good
> idea and won't cause problems, after seeing these patches and
> the discussion from the sideline.

It is very easy to explain: `git repack` can drop unreachable commits
without further warning, making the corresponding entries in
`.git/shallow` invalid, which causes serious problems when deepening the
branches.

The solution is easy: drop also the now-invalid entries in `.git/shallow`
after dropping unreachable commits unceremoniously.

While I am sympathetic to Peff's attempt to make this a bit more general,
I think he overthinks it, and we do not even need to complexify the code
for now.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository
  2018-07-17 19:20     ` Jeff King
@ 2018-07-19 17:48       ` Johannes Schindelin
  0 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin @ 2018-07-19 17:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

Hi Peff,

On Tue, 17 Jul 2018, Jeff King wrote:

> On Tue, Jul 17, 2018 at 03:15:31PM -0400, Jeff King wrote:
> 
> > > - Also trigger `prune_shallow()` when `--unpack-unreachable=<approxidate>` was passed to `git repack`.
> > > - No need to trigger `prune_shallow()` when `git repack` was called with `-k`.
> > 
> > I think you might have missed the bigger problem I pointed at, as it was
> > buried deep within my later reply. Try applying this patch on top, which
> > tests the opposite case (reachable shallow commits are retained):
> 
> By the way, I notice that the patches themselves are cc'd to you, but
> the cover letter isn't. So my reply went only to gitgitgadget, which
> (AFAIK) has not yet learned to work as a mail-to-comment gateway.
> 
> So I'm copying this to you directly to make sure you see it, but also
> because I'm not sure if the gitgitgadget cc behavior is intentional or
> not.

It is one of the existing TODOs to Cc: the person who hit "/submit". So I
am aware of it, and it is recorded in the TODO.md file.

But as the GitHub user might not be associated with a public email, this
will need an internal map that will be extended to whoever is in the list
of allowed users.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-17 16:48     ` Duy Nguyen
@ 2018-07-19 17:50       ` Johannes Schindelin
  0 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin @ 2018-07-19 17:50 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: gitgitgadget, Git Mailing List, Junio C Hamano

Hi,

On Tue, 17 Jul 2018, Duy Nguyen wrote:

> On Tue, Jul 17, 2018 at 6:39 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Fri, Jul 13, 2018 at 10:19 PM Johannes Schindelin via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > >
> > > While it is true that we never add unreachable commits into pack files
> > > intentionally (as `git repack`'s documentation states), we must not
> > > forget that a `git fetch --prune` (or even a `git fetch` when a ref was
> > > force-pushed in the meantime) can make a commit unreachable that was
> > > reachable before.
> > >
> > > Therefore it is not safe to assume that a `git repack -adlf` will keep
> > > unreachable commits alone (under the assumption that they had not been
> > > packed in the first place).
> > >
> > > This is particularly important to keep in mind when looking at the
> > > `.git/shallow` file: if any commits listed in that file become
> > > unreachable, it is not a problem, but if they go missing, it *is* a
> > > problem. One symptom of this problem is that a deepening fetch may now
> > > fail with
> > >
> > >         fatal: error in object: unshallow <commit-hash>
> > >
> >
> > Could you elaborate a bit more?
> 
> Never mind. The problem is described in another patch.. sigh.. I
> understand you want to flip that "failure" to "success" but personally
> I don't think it worth it to look back in history and have "what?"
> moments like when I read this patch alone.

The split was not meant for your benefit, but for the benefit of verifying
that I indeed fixed a problem.

I don't think it is a wise idea to squash them together.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-19 16:42           ` Johannes Schindelin
@ 2018-07-19 20:49             ` Junio C Hamano
  2018-07-20  9:30               ` Junio C Hamano
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2018-07-19 20:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Johannes Schindelin via GitGitGadget, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Tue, 17 Jul 2018, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > I'm OK with having a partial fix, or one that fixes immediate pain
>> > without doing a big cleanup, as long as it doesn't make anything _worse_
>> > in a user-visible way. And that's really my question: is pruning here
>> > going to bite people unexpectedly (not rhetorical -- I really don't
>> > know)?
>> 
>> Yeah, that matches the general guideline I follow when reviewing a
>> patch that claims to make existing things better.  And I do not
>> think I can explain to a third person why pruning here is a good
>> idea and won't cause problems, after seeing these patches and
>> the discussion from the sideline.
>
> It is very easy to explain: `git repack` can drop unreachable commits
> without further warning, making the corresponding entries in
> `.git/shallow` invalid, which causes serious problems when deepening the
> branches.

That explains why you wrote the patch very clearly.

> The solution is easy: drop also the now-invalid entries in `.git/shallow`
> after dropping unreachable commits unceremoniously.

Sorry, but I do not think I can relay that as an explanation why it
won't cause problems to a third person.

The entries in shallow file says that history behind them may not
exist in the repository due to its shallowness but history after
them are supposed to be traversable (otherwise we have a repository
corruption).  It is true that an entry that itself no longer exists
in this repository should not be in shallow file, as the presence of
that entry breaks that promise the file is making---that commit
ought to exist and it is safe to traverse down to it, so keeping the
entry in the file is absolutely a wrong thing to do.

But that does not automatically mean that just simply removing it
makes the resulting repository good, does it?  Wouldn't the solution
for that corruption be to set a new entry to stop history traversal
before reaching that (now-missing) commit?  If your solution and
explanatoin were like that, then I can understand why it won't cause
problems, because the resulting repository would be shallower than
it originally was, as if you cloned with a smaller depth, but it is
not corrupt.

Perhaps your rationale is that by trading one shape of corrupt
repository (where a commit that does not even exist is in the
shallow file, breaking the early start-up sequence when it tries to
read the entries) with another shape of corrupt repsitory (where
shallow does not completely tell where to stop, and traversing the
history can eventually hit a missing commit because no entry in the
shallow file stops such a traversal), at least deepening fetch can
start (instead of dying while trying to see how shallow the
repository currently is) and that can be used to recover the corrupt
repository back into a usable state?  If that is the justification,
I can fully buy that, but that is not what I am hearing in your easy
to explain answer, so I am still puzzled.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-19 20:49             ` Junio C Hamano
@ 2018-07-20  9:30               ` Junio C Hamano
  2018-07-20 19:31                 ` Jeff King
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2018-07-20  9:30 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Johannes Schindelin via GitGitGadget, git

Junio C Hamano <gitster@pobox.com> writes:

> Sorry, but I do not think I can relay that as an explanation why it
> won't cause problems to a third person.

OK, ignore this. I was being stupid.

> The entries in shallow file says that history behind them may not
> exist in the repository due to its shallowness but history after
> them are supposed to be traversable (otherwise we have a repository
> corruption).  It is true that an entry that itself no longer exists
> in this repository should not be in shallow file, as the presence of
> that entry breaks that promise the file is making---that commit
> ought to exist and it is safe to traverse down to it, so keeping the
> entry in the file is absolutely a wrong thing to do.
>
> But that does not automatically mean that just simply removing it
> makes the resulting repository good, does it?  Wouldn't the solution
> for that corruption be to set a new entry to stop history traversal
> before reaching that (now-missing) commit?

The above is overly pessimistic and worried about an impossible
situation, I would think.  The reason why a commit that used to be
in the shallow file is being pruned during a "repack" is because it
has become unreachable.  By definition, no future history traversal
that wants to enumerate reachable commits needs to be stopped from
finding that commits that are older than this commit being pruned
are missing by having this in the shallow list.  If there is a ref
or a reflog entry from which such a problematic traversal starts at,
we wouldn't be pruing this commit in the first place, because the
commit has not become unreachable yet.

So a repository does not become corrupt by pruning the commit *and*
removing it from the shallow file at the same time.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
  2018-07-20  9:30               ` Junio C Hamano
@ 2018-07-20 19:31                 ` Jeff King
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff King @ 2018-07-20 19:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git

On Fri, Jul 20, 2018 at 02:30:23AM -0700, Junio C Hamano wrote:

> > The entries in shallow file says that history behind them may not
> > exist in the repository due to its shallowness but history after
> > them are supposed to be traversable (otherwise we have a repository
> > corruption).  It is true that an entry that itself no longer exists
> > in this repository should not be in shallow file, as the presence of
> > that entry breaks that promise the file is making---that commit
> > ought to exist and it is safe to traverse down to it, so keeping the
> > entry in the file is absolutely a wrong thing to do.
> >
> > But that does not automatically mean that just simply removing it
> > makes the resulting repository good, does it?  Wouldn't the solution
> > for that corruption be to set a new entry to stop history traversal
> > before reaching that (now-missing) commit?
> 
> The above is overly pessimistic and worried about an impossible
> situation, I would think.  The reason why a commit that used to be
> in the shallow file is being pruned during a "repack" is because it
> has become unreachable.  By definition, no future history traversal
> that wants to enumerate reachable commits needs to be stopped from
> finding that commits that are older than this commit being pruned
> are missing by having this in the shallow list.  If there is a ref
> or a reflog entry from which such a problematic traversal starts at,
> we wouldn't be pruing this commit in the first place, because the
> commit has not become unreachable yet.
> 
> So a repository does not become corrupt by pruning the commit *and*
> removing it from the shallow file at the same time.

Right. I think a lot of this is rethinking how shallow pruning works,
too, which is not something Dscho is trying to change. The simplest
argument (which I think Dscho has made elsewhere, too) is: this is
necessary in the current shallow code when dropping objects. We do it
therefore from prune, but miss the case when git-repack is run itself
outside of git-gc.

I do still think the gc/prune architecture is a bit muddled, but at this
point in the discussion I feel OK saying that people running "git repack
-ad" would not be upset to have their shallows pruned.

But the patch is still not OK as-is because prune_shallow() requires the
SEEN flag on each reachable object struct, which we have not set in the
repack process (hence the failing test I posted earlier).  So we need a
solution for that, which may impact ideas about how the call works.
E.g., some possible solutions are:

 - teach pack-objects to optionally trigger the shallow prune based on
   its internal walk

 - have repack use the just-completed pack as a hint about reachability

 - introduce a mechanism to trigger the shallow prune based on a
   commit-only reachability check, and run that from repack (or from gc
   and document that it must be run if you are using repack as a manual
   gc replacement)

I'm not advocating any particular solution there, but just showing that
there's an array of them (and probably more that I didn't mention).

-Peff

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository
  2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2018-07-17 19:15   ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Jeff King
@ 2018-10-22 22:05   ` Johannes Schindelin via GitGitGadget
  2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
                       ` (4 more replies)
  3 siblings, 5 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-22 22:05 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano

Under certain circumstances, commits that were reachable can be made
unreachable, e.g. via git fetch --prune. These commits might have been
packed already, in which case git repack -adlf will just drop them without
giving them the usual grace period before an eventual git prune (via git gc)
prunes them.

This is a problem when the shallow file still lists them, which is the
reason why git prune edits that file. And with the proposed changes, git
repack -ad will now do the same.

Reported by Alejandro Pauly.

Changes since v2:

 * Fixed a typo in the last commit message.
 * Added an explanation to the last commit message why we do not simply skip
   non-existing shallow commits at fetch time.
 * Introduced a new, "quick prune" mode where prune_shallow() does not try
   to drop unreachable commits, but only non-existing ones.
 * Rebased to current master because there were too many merge conflicts for
   my liking otherwise.

Changes since v1:

 * Also trigger prune_shallow() when --unpack-unreachable=<approxidate> was
   passed to git repack.
 * No need to trigger prune_shallow() when git repack was called with -k.

Johannes Schindelin (3):
  repack: point out a bug handling stale shallow info
  shallow: offer to prune only non-existing entries
  repack -ad: prune the list of shallow commits

 builtin/prune.c          |  2 +-
 builtin/repack.c         |  6 ++++++
 commit.h                 |  2 +-
 shallow.c                | 22 +++++++++++++++++-----
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 5 files changed, 52 insertions(+), 7 deletions(-)


base-commit: c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-9%2Fdscho%2Fshallow-and-fetch-prune-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-9/dscho/shallow-and-fetch-prune-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/9

Range-diff vs v2:

 1:  d2be40131 ! 1:  ed8559b91 repack: point out a bug handling stale shallow info
     @@ -48,4 +48,6 @@
      +		origin "+refs/heads/*:refs/remotes/origin/*"
      +'
      +
     - test_done
     + . "$TEST_DIRECTORY"/lib-httpd.sh
     + start_httpd
     + 
 -:  --------- > 2:  f085eb4f7 shallow: offer to prune only non-existing entries
 2:  c7ee6e008 ! 3:  1f9ff57d5 repack -ad: prune the list of shallow commits
     @@ -2,15 +2,19 @@
      
          repack -ad: prune the list of shallow commits
      
     -    While it is true that we never add unreachable commits into pack files
     -    intentionally (as `git repack`'s documentation states), we must not
     -    forget that a `git fetch --prune` (or even a `git fetch` when a ref was
     +    `git repack` can drop unreachable commits without further warning,
     +    making the corresponding entries in `.git/shallow` invalid, which causes
     +    serious problems when deepening the branches.
     +
     +    One scenario where unreachable commits are dropped by `git repack` is
     +    when a `git fetch --prune` (or even a `git fetch` when a ref was
          force-pushed in the meantime) can make a commit unreachable that was
          reachable before.
      
          Therefore it is not safe to assume that a `git repack -adlf` will keep
          unreachable commits alone (under the assumption that they had not been
     -    packed in the first place).
     +    packed in the first place, which is an assumption at least some of Git's
     +    code seems to make).
      
          This is particularly important to keep in mind when looking at the
          `.git/shallow` file: if any commits listed in that file become
     @@ -23,8 +27,16 @@
          To avoid this problem, let's prune the shallow list in `git repack` when
          the `-d` option is passed, unless `-A` is passed, too (which would force
          the now-unreachable objects to be turned into loose objects instead of
     -    being deleted). Additionally, e also need to take `--keep-reachable` and
     -    `--unpack-unreachable=<date>` into account.
     +    being deleted). Additionally, we also need to take `--keep-reachable`
     +    and `--unpack-unreachable=<date>` into account.
     +
     +    Note: an alternative solution discussed during the review of this patch
     +    was to teach `git fetch` to simply ignore entries in .git/shallow if the
     +    corresponding commits do not exist locally. A quick test, however,
     +    revealed that the .git/shallow file is written during a shallow *clone*,
     +    in which case the commits do not exist, either, but the "shallow" line
     +    *does* need to be sent. Therefore, this approach would be a lot more
     +    finicky than the approach presented by the this patch.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
     @@ -32,15 +44,15 @@
      --- a/builtin/repack.c
      +++ b/builtin/repack.c
      @@
     - 		if (!quiet && isatty(2))
     + 		if (!po_args.quiet && isatty(2))
       			opts |= PRUNE_PACKED_VERBOSE;
       		prune_packed_objects(opts);
      +
      +		if (!keep_unreachable &&
      +		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
      +		     unpack_unreachable) &&
     -+		    is_repository_shallow())
     -+			prune_shallow(0);
     ++		    is_repository_shallow(the_repository))
     ++			prune_shallow(0, 1);
       	}
       
       	if (!no_update_server_info)

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3 1/3] repack: point out a bug handling stale shallow info
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
@ 2018-10-22 22:05     ` Johannes Schindelin via GitGitGadget
  2018-10-24  3:39       ` Junio C Hamano
  2018-10-22 22:05     ` [PATCH v3 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-22 22:05 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

A `git fetch --prune` can turn previously-reachable objects unreachable,
even commits that are in the `shallow` list. A subsequent `git repack
-ad` will then unceremoniously drop those unreachable commits, and the
`shallow` list will become stale. This means that when we try to fetch
with a larger `--depth` the next time, we may end up with:

	fatal: error in object: unshallow <commit-hash>

Reported by Alejandro Pauly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 7045685e2..2d0031703 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,6 +186,33 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure '.git/shallow is edited by repack' '
+	git init shallow-server &&
+	test_commit -C shallow-server A &&
+	test_commit -C shallow-server B &&
+	git -C shallow-server checkout -b branch &&
+	test_commit -C shallow-server C &&
+	test_commit -C shallow-server E &&
+	test_commit -C shallow-server D &&
+	d="$(git -C shallow-server rev-parse --verify D)" &&
+	git -C shallow-server checkout master &&
+
+	git clone --depth=1 --no-tags --no-single-branch \
+		"file://$PWD/shallow-server" shallow-client &&
+
+	: now remove the branch and fetch with prune &&
+	git -C shallow-server branch -D branch &&
+	git -C shallow-client fetch --prune --depth=1 \
+		origin "+refs/heads/*:refs/remotes/origin/*" &&
+	git -C shallow-client repack -adfl &&
+	test_must_fail git -C shallow-client rev-parse --verify $d^0 &&
+	! grep $d shallow-client/.git/shallow &&
+
+	git -C shallow-server branch branch-orig D^0 &&
+	git -C shallow-client fetch --prune --depth=2 \
+		origin "+refs/heads/*:refs/remotes/origin/*"
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
  2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
@ 2018-10-22 22:05     ` Johannes Schindelin via GitGitGadget
  2018-10-24  3:47       ` Junio C Hamano
  2018-10-22 22:05     ` [PATCH v3 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-22 22:05 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `prune_shallow()` function wants a full reachability check to be
completed before it goes to work, to ensure that all unreachable entries
are removed from the shallow file.

However, in the upcoming patch we do not even want to go that far. We
really only need to remove entries corresponding to pruned commits, i.e.
to commits that no longer exist.

Let's support that use case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/prune.c |  2 +-
 commit.h        |  2 +-
 shallow.c       | 22 +++++++++++++++++-----
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 41230f821..6d6ab6cf1 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	free(s);
 
 	if (is_repository_shallow(the_repository))
-		prune_shallow(show_only);
+		prune_shallow(show_only, 0);
 
 	return 0;
 }
diff --git a/commit.h b/commit.h
index 1d260d62f..ff34447ab 100644
--- a/commit.h
+++ b/commit.h
@@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct shallow_info *info,
 					   uint32_t **used,
 					   int *ref_status);
 extern int delayed_reachability_test(struct shallow_info *si, int c);
-extern void prune_shallow(int show_only);
+extern void prune_shallow(int show_only, int quick_prune);
 extern struct trace_key trace_shallow;
 
 extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
diff --git a/shallow.c b/shallow.c
index 732e18d54..0a2671bc2 100644
--- a/shallow.c
+++ b/shallow.c
@@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct repository *r)
 
 #define SEEN_ONLY 1
 #define VERBOSE   2
+#define QUICK_PRUNE 4
 
 struct write_shallow_data {
 	struct strbuf *out;
@@ -261,7 +262,11 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 	const char *hex = oid_to_hex(&graft->oid);
 	if (graft->nr_parent != -1)
 		return 0;
-	if (data->flags & SEEN_ONLY) {
+	if (data->flags & QUICK_PRUNE) {
+		struct commit *c = lookup_commit(the_repository, &graft->oid);
+		if (!c || parse_commit(c))
+			return 0;
+	} else if (data->flags & SEEN_ONLY) {
 		struct commit *c = lookup_commit(the_repository, &graft->oid);
 		if (!c || !(c->object.flags & SEEN)) {
 			if (data->flags & VERBOSE)
@@ -371,16 +376,23 @@ void advertise_shallow_grafts(int fd)
 
 /*
  * mark_reachable_objects() should have been run prior to this and all
- * reachable commits marked as "SEEN".
+ * reachable commits marked as "SEEN", except when quick_prune is non-zero,
+ * in which case lines are excised from the shallow file if they refer to
+ * commits that do not exist (any longer).
  */
-void prune_shallow(int show_only)
+void prune_shallow(int show_only, int quick_prune)
 {
 	struct lock_file shallow_lock = LOCK_INIT;
 	struct strbuf sb = STRBUF_INIT;
+	unsigned flags = SEEN_ONLY;
 	int fd;
 
+	if (quick_prune)
+		flags |= QUICK_PRUNE;
+
 	if (show_only) {
-		write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY | VERBOSE);
+		flags |= VERBOSE;
+		write_shallow_commits_1(&sb, 0, NULL, flags);
 		strbuf_release(&sb);
 		return;
 	}
@@ -388,7 +400,7 @@ void prune_shallow(int show_only)
 				       git_path_shallow(the_repository),
 				       LOCK_DIE_ON_ERROR);
 	check_shallow_file_for_update(the_repository);
-	if (write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY)) {
+	if (write_shallow_commits_1(&sb, 0, NULL, flags)) {
 		if (write_in_full(fd, sb.buf, sb.len) < 0)
 			die_errno("failed to write to %s",
 				  get_lock_file_path(&shallow_lock));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v3 3/3] repack -ad: prune the list of shallow commits
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
  2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
  2018-10-22 22:05     ` [PATCH v3 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
@ 2018-10-22 22:05     ` Johannes Schindelin via GitGitGadget
  2018-10-24  3:56       ` Junio C Hamano
  2018-10-23 10:15     ` [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository Johannes Schindelin
  2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  4 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-22 22:05 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

`git repack` can drop unreachable commits without further warning,
making the corresponding entries in `.git/shallow` invalid, which causes
serious problems when deepening the branches.

One scenario where unreachable commits are dropped by `git repack` is
when a `git fetch --prune` (or even a `git fetch` when a ref was
force-pushed in the meantime) can make a commit unreachable that was
reachable before.

Therefore it is not safe to assume that a `git repack -adlf` will keep
unreachable commits alone (under the assumption that they had not been
packed in the first place, which is an assumption at least some of Git's
code seems to make).

This is particularly important to keep in mind when looking at the
`.git/shallow` file: if any commits listed in that file become
unreachable, it is not a problem, but if they go missing, it *is* a
problem. One symptom of this problem is that a deepening fetch may now
fail with

	fatal: error in object: unshallow <commit-hash>

To avoid this problem, let's prune the shallow list in `git repack` when
the `-d` option is passed, unless `-A` is passed, too (which would force
the now-unreachable objects to be turned into loose objects instead of
being deleted). Additionally, we also need to take `--keep-reachable`
and `--unpack-unreachable=<date>` into account.

Note: an alternative solution discussed during the review of this patch
was to teach `git fetch` to simply ignore entries in .git/shallow if the
corresponding commits do not exist locally. A quick test, however,
revealed that the .git/shallow file is written during a shallow *clone*,
in which case the commits do not exist, either, but the "shallow" line
*does* need to be sent. Therefore, this approach would be a lot more
finicky than the approach presented by the this patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/repack.c         | 6 ++++++
 t/t5537-fetch-shallow.sh | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index c6a7943d5..9217fc832 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -549,6 +549,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!po_args.quiet && isatty(2))
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
+
+		if (!keep_unreachable &&
+		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
+		     unpack_unreachable) &&
+		    is_repository_shallow(the_repository))
+			prune_shallow(0, 1);
 	}
 
 	if (!no_update_server_info)
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 2d0031703..777c9d1dc 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,7 +186,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure '.git/shallow is edited by repack' '
+test_expect_success '.git/shallow is edited by repack' '
 	git init shallow-server &&
 	test_commit -C shallow-server A &&
 	test_commit -C shallow-server B &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
                       ` (2 preceding siblings ...)
  2018-10-22 22:05     ` [PATCH v3 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
@ 2018-10-23 10:15     ` Johannes Schindelin
  2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  4 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-23 10:15 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Duy Nguyen, Jeff King, Junio C Hamano

Hi,

On Mon, 22 Oct 2018, Johannes Schindelin via GitGitGadget wrote:

> Under certain circumstances, commits that were reachable can be made
> unreachable, e.g. via git fetch --prune. These commits might have been
> packed already, in which case git repack -adlf will just drop them without
> giving them the usual grace period before an eventual git prune (via git gc)
> prunes them.
> 
> This is a problem when the shallow file still lists them, which is the
> reason why git prune edits that file. And with the proposed changes, git
> repack -ad will now do the same.
> 
> Reported by Alejandro Pauly.
> 
> Changes since v2:
> 
>  * Fixed a typo in the last commit message.
>  * Added an explanation to the last commit message why we do not simply skip
>    non-existing shallow commits at fetch time.
>  * Introduced a new, "quick prune" mode where prune_shallow() does not try
>    to drop unreachable commits, but only non-existing ones.

BTW this was supposed to address Peff's concern that the SEEN flag would
not be marking all reachable shallow commits at the time when
`cmd_repack()` calls `prune_shallow()`, i.e. the last concern about this
stop-gap patch series.

Ciao,
Dscho

>  * Rebased to current master because there were too many merge conflicts for
>    my liking otherwise.
> 
> Changes since v1:
> 
>  * Also trigger prune_shallow() when --unpack-unreachable=<approxidate> was
>    passed to git repack.
>  * No need to trigger prune_shallow() when git repack was called with -k.
> 
> Johannes Schindelin (3):
>   repack: point out a bug handling stale shallow info
>   shallow: offer to prune only non-existing entries
>   repack -ad: prune the list of shallow commits
> 
>  builtin/prune.c          |  2 +-
>  builtin/repack.c         |  6 ++++++
>  commit.h                 |  2 +-
>  shallow.c                | 22 +++++++++++++++++-----
>  t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
>  5 files changed, 52 insertions(+), 7 deletions(-)
> 
> 
> base-commit: c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1
> Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-9%2Fdscho%2Fshallow-and-fetch-prune-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-9/dscho/shallow-and-fetch-prune-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/9
> 
> Range-diff vs v2:
> 
>  1:  d2be40131 ! 1:  ed8559b91 repack: point out a bug handling stale shallow info
>      @@ -48,4 +48,6 @@
>       +		origin "+refs/heads/*:refs/remotes/origin/*"
>       +'
>       +
>      - test_done
>      + . "$TEST_DIRECTORY"/lib-httpd.sh
>      + start_httpd
>      + 
>  -:  --------- > 2:  f085eb4f7 shallow: offer to prune only non-existing entries
>  2:  c7ee6e008 ! 3:  1f9ff57d5 repack -ad: prune the list of shallow commits
>      @@ -2,15 +2,19 @@
>       
>           repack -ad: prune the list of shallow commits
>       
>      -    While it is true that we never add unreachable commits into pack files
>      -    intentionally (as `git repack`'s documentation states), we must not
>      -    forget that a `git fetch --prune` (or even a `git fetch` when a ref was
>      +    `git repack` can drop unreachable commits without further warning,
>      +    making the corresponding entries in `.git/shallow` invalid, which causes
>      +    serious problems when deepening the branches.
>      +
>      +    One scenario where unreachable commits are dropped by `git repack` is
>      +    when a `git fetch --prune` (or even a `git fetch` when a ref was
>           force-pushed in the meantime) can make a commit unreachable that was
>           reachable before.
>       
>           Therefore it is not safe to assume that a `git repack -adlf` will keep
>           unreachable commits alone (under the assumption that they had not been
>      -    packed in the first place).
>      +    packed in the first place, which is an assumption at least some of Git's
>      +    code seems to make).
>       
>           This is particularly important to keep in mind when looking at the
>           `.git/shallow` file: if any commits listed in that file become
>      @@ -23,8 +27,16 @@
>           To avoid this problem, let's prune the shallow list in `git repack` when
>           the `-d` option is passed, unless `-A` is passed, too (which would force
>           the now-unreachable objects to be turned into loose objects instead of
>      -    being deleted). Additionally, e also need to take `--keep-reachable` and
>      -    `--unpack-unreachable=<date>` into account.
>      +    being deleted). Additionally, we also need to take `--keep-reachable`
>      +    and `--unpack-unreachable=<date>` into account.
>      +
>      +    Note: an alternative solution discussed during the review of this patch
>      +    was to teach `git fetch` to simply ignore entries in .git/shallow if the
>      +    corresponding commits do not exist locally. A quick test, however,
>      +    revealed that the .git/shallow file is written during a shallow *clone*,
>      +    in which case the commits do not exist, either, but the "shallow" line
>      +    *does* need to be sent. Therefore, this approach would be a lot more
>      +    finicky than the approach presented by the this patch.
>       
>           Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>       
>      @@ -32,15 +44,15 @@
>       --- a/builtin/repack.c
>       +++ b/builtin/repack.c
>       @@
>      - 		if (!quiet && isatty(2))
>      + 		if (!po_args.quiet && isatty(2))
>        			opts |= PRUNE_PACKED_VERBOSE;
>        		prune_packed_objects(opts);
>       +
>       +		if (!keep_unreachable &&
>       +		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
>       +		     unpack_unreachable) &&
>      -+		    is_repository_shallow())
>      -+			prune_shallow(0);
>      ++		    is_repository_shallow(the_repository))
>      ++			prune_shallow(0, 1);
>        	}
>        
>        	if (!no_update_server_info)
> 
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 1/3] repack: point out a bug handling stale shallow info
  2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
@ 2018-10-24  3:39       ` Junio C Hamano
  2018-10-24  8:12         ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2018-10-24  3:39 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Duy Nguyen, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> A `git fetch --prune` can turn previously-reachable objects unreachable,
> even commits that are in the `shallow` list. A subsequent `git repack
> -ad` will then unceremoniously drop those unreachable commits, and the
> `shallow` list will become stale. This means that when we try to fetch
> with a larger `--depth` the next time, we may end up with:
>
> 	fatal: error in object: unshallow <commit-hash>

Nicely analysed and described.  One minor thing nagged me in the
implementation but it is not a big issue.

> +...
> +	d="$(git -C shallow-server rev-parse --verify D)" &&
> +	git -C shallow-server checkout master &&
> +
> +...
> +	! grep $d shallow-client/.git/shallow &&

We know D (and $d) is not a tag, but perhaps the place that assigns
to $d (way above) can do the rev-parse of D^0, not D, to make it
more clear what is going on, especially given that...

> +	git -C shallow-server branch branch-orig D^0 &&

... this does that D^0 thing here to makes us wonder if D needs
unwrapping before using it as a commit (not commit-ish). 

If we did so, we could use $d here instead of D^0.

> +	git -C shallow-client fetch --prune --depth=2 \
> +		origin "+refs/heads/*:refs/remotes/origin/*"
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-22 22:05     ` [PATCH v3 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
@ 2018-10-24  3:47       ` Junio C Hamano
  2018-10-24  8:01         ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2018-10-24  3:47 UTC (permalink / raw)
  To: Jonathan Tan, Christian Couder
  Cc: git, Duy Nguyen, Jeff King, Johannes Schindelin

Jonathan, do you see any issues with the use of lookup_commit() in
this change wrt lazy clone?  I am wondering what happens when the
commit in question is at, an immediate parent of, or an immediate
child of a promisor object.  I _think_ this change won't make it
worse for two features in playing together, but thought that it
would be better to double check.

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `prune_shallow()` function wants a full reachability check to be
> completed before it goes to work, to ensure that all unreachable entries
> are removed from the shallow file.
>
> However, in the upcoming patch we do not even want to go that far. We
> really only need to remove entries corresponding to pruned commits, i.e.
> to commits that no longer exist.
>
> Let's support that use case.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  builtin/prune.c |  2 +-
>  commit.h        |  2 +-
>  shallow.c       | 22 +++++++++++++++++-----
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 41230f821..6d6ab6cf1 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>  	free(s);
>  
>  	if (is_repository_shallow(the_repository))
> -		prune_shallow(show_only);
> +		prune_shallow(show_only, 0);
>  
>  	return 0;
>  }
> diff --git a/commit.h b/commit.h
> index 1d260d62f..ff34447ab 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct shallow_info *info,
>  					   uint32_t **used,
>  					   int *ref_status);
>  extern int delayed_reachability_test(struct shallow_info *si, int c);
> -extern void prune_shallow(int show_only);
> +extern void prune_shallow(int show_only, int quick_prune);
>  extern struct trace_key trace_shallow;
>  
>  extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
> diff --git a/shallow.c b/shallow.c
> index 732e18d54..0a2671bc2 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct repository *r)
>  
>  #define SEEN_ONLY 1
>  #define VERBOSE   2
> +#define QUICK_PRUNE 4
>  
>  struct write_shallow_data {
>  	struct strbuf *out;
> @@ -261,7 +262,11 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
>  	const char *hex = oid_to_hex(&graft->oid);
>  	if (graft->nr_parent != -1)
>  		return 0;
> -	if (data->flags & SEEN_ONLY) {
> +	if (data->flags & QUICK_PRUNE) {
> +		struct commit *c = lookup_commit(the_repository, &graft->oid);
> +		if (!c || parse_commit(c))
> +			return 0;
> +	} else if (data->flags & SEEN_ONLY) {
>  		struct commit *c = lookup_commit(the_repository, &graft->oid);
>  		if (!c || !(c->object.flags & SEEN)) {
>  			if (data->flags & VERBOSE)
> @@ -371,16 +376,23 @@ void advertise_shallow_grafts(int fd)
>  
>  /*
>   * mark_reachable_objects() should have been run prior to this and all
> - * reachable commits marked as "SEEN".
> + * reachable commits marked as "SEEN", except when quick_prune is non-zero,
> + * in which case lines are excised from the shallow file if they refer to
> + * commits that do not exist (any longer).
>   */
> -void prune_shallow(int show_only)
> +void prune_shallow(int show_only, int quick_prune)
>  {
>  	struct lock_file shallow_lock = LOCK_INIT;
>  	struct strbuf sb = STRBUF_INIT;
> +	unsigned flags = SEEN_ONLY;
>  	int fd;
>  
> +	if (quick_prune)
> +		flags |= QUICK_PRUNE;
> +
>  	if (show_only) {
> -		write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY | VERBOSE);
> +		flags |= VERBOSE;
> +		write_shallow_commits_1(&sb, 0, NULL, flags);
>  		strbuf_release(&sb);
>  		return;
>  	}
> @@ -388,7 +400,7 @@ void prune_shallow(int show_only)
>  				       git_path_shallow(the_repository),
>  				       LOCK_DIE_ON_ERROR);
>  	check_shallow_file_for_update(the_repository);
> -	if (write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY)) {
> +	if (write_shallow_commits_1(&sb, 0, NULL, flags)) {
>  		if (write_in_full(fd, sb.buf, sb.len) < 0)
>  			die_errno("failed to write to %s",
>  				  get_lock_file_path(&shallow_lock));

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] repack -ad: prune the list of shallow commits
  2018-10-22 22:05     ` [PATCH v3 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
@ 2018-10-24  3:56       ` Junio C Hamano
  2018-10-24  8:02         ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2018-10-24  3:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Duy Nguyen, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index c6a7943d5..9217fc832 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -549,6 +549,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		if (!po_args.quiet && isatty(2))
>  			opts |= PRUNE_PACKED_VERBOSE;
>  		prune_packed_objects(opts);
> +
> +		if (!keep_unreachable &&
> +		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
> +		     unpack_unreachable) &&
> +		    is_repository_shallow(the_repository))
> +			prune_shallow(0, 1);

The logic looks correct (and the reasoning behind it, which was
explained in the log message, was sound).  prune_shallow(0, 1)
however is not easily understandable.

There are only two callsites of this function after these three
patches, and the areas of the code that have these calls are in
relatively quiescent parts in the whole tree, so it shouldn't be too
distracting to do an update to make the function take a flag word,
so that we can make callsites more immediately obvious, i.e.

	prune_shallow(PRUNE_SHALLOW_QUICK)

It of course can be left as a low-hanging fruit loose-end.

> diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
> index 2d0031703..777c9d1dc 100755
> --- a/t/t5537-fetch-shallow.sh
> +++ b/t/t5537-fetch-shallow.sh
> @@ -186,7 +186,7 @@ EOF
>  	test_cmp expect actual
>  '
>  
> -test_expect_failure '.git/shallow is edited by repack' '
> +test_expect_success '.git/shallow is edited by repack' '
>  	git init shallow-server &&
>  	test_commit -C shallow-server A &&
>  	test_commit -C shallow-server B &&

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-24  3:47       ` Junio C Hamano
@ 2018-10-24  8:01         ` Johannes Schindelin
  2018-10-24 15:56           ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-24  8:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, Christian Couder, git, Duy Nguyen, Jeff King

Hi Junio & Jonathan,

On Wed, 24 Oct 2018, Junio C Hamano wrote:

> Jonathan, do you see any issues with the use of lookup_commit() in
> this change wrt lazy clone?  I am wondering what happens when the
> commit in question is at, an immediate parent of, or an immediate
> child of a promisor object.  I _think_ this change won't make it
> worse for two features in playing together, but thought that it
> would be better to double check.

Good point.

Instinctively, I would say that no promised object can be a shallow
commit. The entire idea of a shallow commit is that it *is* present, but
none of its parents.

Also, I would claim that the shallow feature does not make sense with lazy
clones, as lazy clones offer a superset of shallow clone's functionality.

However, I am curious whether there is a better way to check for the
presence of a local commit? Do we have an API function for that, that I
missed? (I do not really want to parse the commit, after all, just verify
that it is not pruned.)

Thanks,
Dscho

> 
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > The `prune_shallow()` function wants a full reachability check to be
> > completed before it goes to work, to ensure that all unreachable entries
> > are removed from the shallow file.
> >
> > However, in the upcoming patch we do not even want to go that far. We
> > really only need to remove entries corresponding to pruned commits, i.e.
> > to commits that no longer exist.
> >
> > Let's support that use case.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  builtin/prune.c |  2 +-
> >  commit.h        |  2 +-
> >  shallow.c       | 22 +++++++++++++++++-----
> >  3 files changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/builtin/prune.c b/builtin/prune.c
> > index 41230f821..6d6ab6cf1 100644
> > --- a/builtin/prune.c
> > +++ b/builtin/prune.c
> > @@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
> >  	free(s);
> >  
> >  	if (is_repository_shallow(the_repository))
> > -		prune_shallow(show_only);
> > +		prune_shallow(show_only, 0);
> >  
> >  	return 0;
> >  }
> > diff --git a/commit.h b/commit.h
> > index 1d260d62f..ff34447ab 100644
> > --- a/commit.h
> > +++ b/commit.h
> > @@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct shallow_info *info,
> >  					   uint32_t **used,
> >  					   int *ref_status);
> >  extern int delayed_reachability_test(struct shallow_info *si, int c);
> > -extern void prune_shallow(int show_only);
> > +extern void prune_shallow(int show_only, int quick_prune);
> >  extern struct trace_key trace_shallow;
> >  
> >  extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
> > diff --git a/shallow.c b/shallow.c
> > index 732e18d54..0a2671bc2 100644
> > --- a/shallow.c
> > +++ b/shallow.c
> > @@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct repository *r)
> >  
> >  #define SEEN_ONLY 1
> >  #define VERBOSE   2
> > +#define QUICK_PRUNE 4
> >  
> >  struct write_shallow_data {
> >  	struct strbuf *out;
> > @@ -261,7 +262,11 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
> >  	const char *hex = oid_to_hex(&graft->oid);
> >  	if (graft->nr_parent != -1)
> >  		return 0;
> > -	if (data->flags & SEEN_ONLY) {
> > +	if (data->flags & QUICK_PRUNE) {
> > +		struct commit *c = lookup_commit(the_repository, &graft->oid);
> > +		if (!c || parse_commit(c))
> > +			return 0;
> > +	} else if (data->flags & SEEN_ONLY) {
> >  		struct commit *c = lookup_commit(the_repository, &graft->oid);
> >  		if (!c || !(c->object.flags & SEEN)) {
> >  			if (data->flags & VERBOSE)
> > @@ -371,16 +376,23 @@ void advertise_shallow_grafts(int fd)
> >  
> >  /*
> >   * mark_reachable_objects() should have been run prior to this and all
> > - * reachable commits marked as "SEEN".
> > + * reachable commits marked as "SEEN", except when quick_prune is non-zero,
> > + * in which case lines are excised from the shallow file if they refer to
> > + * commits that do not exist (any longer).
> >   */
> > -void prune_shallow(int show_only)
> > +void prune_shallow(int show_only, int quick_prune)
> >  {
> >  	struct lock_file shallow_lock = LOCK_INIT;
> >  	struct strbuf sb = STRBUF_INIT;
> > +	unsigned flags = SEEN_ONLY;
> >  	int fd;
> >  
> > +	if (quick_prune)
> > +		flags |= QUICK_PRUNE;
> > +
> >  	if (show_only) {
> > -		write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY | VERBOSE);
> > +		flags |= VERBOSE;
> > +		write_shallow_commits_1(&sb, 0, NULL, flags);
> >  		strbuf_release(&sb);
> >  		return;
> >  	}
> > @@ -388,7 +400,7 @@ void prune_shallow(int show_only)
> >  				       git_path_shallow(the_repository),
> >  				       LOCK_DIE_ON_ERROR);
> >  	check_shallow_file_for_update(the_repository);
> > -	if (write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY)) {
> > +	if (write_shallow_commits_1(&sb, 0, NULL, flags)) {
> >  		if (write_in_full(fd, sb.buf, sb.len) < 0)
> >  			die_errno("failed to write to %s",
> >  				  get_lock_file_path(&shallow_lock));
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 3/3] repack -ad: prune the list of shallow commits
  2018-10-24  3:56       ` Junio C Hamano
@ 2018-10-24  8:02         ` Johannes Schindelin
  0 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-24  8:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Duy Nguyen, Jeff King

Hi Junio,

On Wed, 24 Oct 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index c6a7943d5..9217fc832 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -549,6 +549,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  		if (!po_args.quiet && isatty(2))
> >  			opts |= PRUNE_PACKED_VERBOSE;
> >  		prune_packed_objects(opts);
> > +
> > +		if (!keep_unreachable &&
> > +		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
> > +		     unpack_unreachable) &&
> > +		    is_repository_shallow(the_repository))
> > +			prune_shallow(0, 1);
> 
> The logic looks correct (and the reasoning behind it, which was
> explained in the log message, was sound).  prune_shallow(0, 1)
> however is not easily understandable.
> 
> There are only two callsites of this function after these three
> patches, and the areas of the code that have these calls are in
> relatively quiescent parts in the whole tree, so it shouldn't be too
> distracting to do an update to make the function take a flag word,
> so that we can make callsites more immediately obvious, i.e.
> 
> 	prune_shallow(PRUNE_SHALLOW_QUICK)
> 
> It of course can be left as a low-hanging fruit loose-end.

I almost did that, but then decided not to clutter the previous patch with
this change (and global constant).

Having said that, since you also had this idea, I will make that change.
It will read nicer, you are right.

Ciao,
Dscho

> 
> > diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
> > index 2d0031703..777c9d1dc 100755
> > --- a/t/t5537-fetch-shallow.sh
> > +++ b/t/t5537-fetch-shallow.sh
> > @@ -186,7 +186,7 @@ EOF
> >  	test_cmp expect actual
> >  '
> >  
> > -test_expect_failure '.git/shallow is edited by repack' '
> > +test_expect_success '.git/shallow is edited by repack' '
> >  	git init shallow-server &&
> >  	test_commit -C shallow-server A &&
> >  	test_commit -C shallow-server B &&
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 1/3] repack: point out a bug handling stale shallow info
  2018-10-24  3:39       ` Junio C Hamano
@ 2018-10-24  8:12         ` Johannes Schindelin
  2018-10-24  8:38           ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-24  8:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Duy Nguyen, Jeff King

Hi Junio,

On Wed, 24 Oct 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > +...
> > +	d="$(git -C shallow-server rev-parse --verify D)" &&
> > +	git -C shallow-server checkout master &&
> > +
> > +...
> > +	! grep $d shallow-client/.git/shallow &&
> 
> We know D (and $d) is not a tag,


Actually, it is... the `test_commit D` creates that tag, and that is what
I use here.

> but perhaps the place that assigns to $d (way above) can do the
> rev-parse of D^0, not D, to make it more clear what is going on,
> especially given that...

... that the `grep` really wants to test for the absence of the *commit*,
not the *tag* in .git/shallow?

Yes, you are right ;-)

So why did my test do the right thing, if it looked at a tag, but did not
dereference it via ^0? The answer is: the `test_commit` function creates
light-weight tags, i.e. no tag objects. And therefore, the $d^0 you found
below, that's just confusing.

I will change it so that the `rev-parse` call uses ^0 (even if it is
technically not necessary), to avoid said confusion.

Thanks,
Dscho

> 
> > +	git -C shallow-server branch branch-orig D^0 &&
> 
> ... this does that D^0 thing here to makes us wonder if D needs
> unwrapping before using it as a commit (not commit-ish). 
> 
> If we did so, we could use $d here instead of D^0.
> 
> > +	git -C shallow-client fetch --prune --depth=2 \
> > +		origin "+refs/heads/*:refs/remotes/origin/*"
> > +'
> > +
> >  . "$TEST_DIRECTORY"/lib-httpd.sh
> >  start_httpd
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 1/3] repack: point out a bug handling stale shallow info
  2018-10-24  8:12         ` Johannes Schindelin
@ 2018-10-24  8:38           ` Johannes Schindelin
  0 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-24  8:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Duy Nguyen, Jeff King

Hi Junio,

me again. I was wrong.

On Wed, 24 Oct 2018, Johannes Schindelin wrote:

> On Wed, 24 Oct 2018, Junio C Hamano wrote:
> 
> > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> > 
> > > +...
> > > +	d="$(git -C shallow-server rev-parse --verify D)" &&
> > > +	git -C shallow-server checkout master &&
> > > +
> > > +...
> > > +	! grep $d shallow-client/.git/shallow &&
> > 
> > We know D (and $d) is not a tag,
> 
> Actually, it is... the `test_commit D` creates that tag, and that is what
> I use here.
> 
> > but perhaps the place that assigns to $d (way above) can do the
> > rev-parse of D^0, not D, to make it more clear what is going on,
> > especially given that...
> 
> ... that the `grep` really wants to test for the absence of the *commit*,
> not the *tag* in .git/shallow?
> 
> Yes, you are right ;-)
> 
> So why did my test do the right thing, if it looked at a tag, but did not
> dereference it via ^0? The answer is: the `test_commit` function creates
> light-weight tags, i.e. no tag objects. And therefore, the $d^0 you found
> below, that's just confusing.

What I was referring to was the call

	test_must_fail git -C shallow-client rev-parse --verify $d^0

However, here we *have* to append ^0, otherwise `rev-parse --verify` will
(and quite confusingly so) *succeed*. I *repeatedly* fall into that trap
that `git rev-parse --verify 0000000000000000000000000000000000000000`
will succeed. Why? Because that is a valid 40-digit hex string. Not
because the object exists. Because it does not.

So I managed to confuse myself again into believing that I only need to
append ^0 to the earlier rev-parse call, but can remove it from this one,
when in reality, I have to append it to both. In the first case, to avoid
having to think about dereferencing a tag, in the second case, to force
rev-parse to look for the object.

Ciao,
Dscho

> 
> I will change it so that the `rev-parse` call uses ^0 (even if it is
> technically not necessary), to avoid said confusion.
> 
> Thanks,
> Dscho
> 
> > 
> > > +	git -C shallow-server branch branch-orig D^0 &&
> > 
> > ... this does that D^0 thing here to makes us wonder if D needs
> > unwrapping before using it as a commit (not commit-ish). 
> > 
> > If we did so, we could use $d here instead of D^0.
> > 
> > > +	git -C shallow-client fetch --prune --depth=2 \
> > > +		origin "+refs/heads/*:refs/remotes/origin/*"
> > > +'
> > > +
> > >  . "$TEST_DIRECTORY"/lib-httpd.sh
> > >  start_httpd
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4 0/3] repack -ad: fix after fetch --prune in a shallow repository
  2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
                       ` (3 preceding siblings ...)
  2018-10-23 10:15     ` [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository Johannes Schindelin
@ 2018-10-24 15:56     ` Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
                         ` (2 more replies)
  4 siblings, 3 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-24 15:56 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano

Under certain circumstances, commits that were reachable can be made
unreachable, e.g. via git fetch --prune. These commits might have been
packed already, in which case git repack -adlf will just drop them without
giving them the usual grace period before an eventual git prune (via git gc)
prunes them.

This is a problem when the shallow file still lists them, which is the
reason why git prune edits that file. And with the proposed changes, git
repack -ad will now do the same.

Reported by Alejandro Pauly.

Changes since v3:

 * Made the regression test less confusing with regards to tags that might
   need to be dereferenced.
 * Introduced new global constants PRUNE_SHOW_ONLY and PRUNE_QUICK instead
   of extending the signature of prune_shallow() with Boolean parameters.
 * While at it, renamed the file-local constant from QUICK_PRUNE to QUICK.
 * Replaced the lookup_commit() && parse_commit() cadence (that wants to
   test for the existence of a commit) to has_object_file(), for ease of
   readability, and also to make it more obvious how to add a call to 
   is_promisor_object() if we ever figure out that that would be necessary.

Changes since v2:

 * Fixed a typo in the last commit message.
 * Added an explanation to the last commit message why we do not simply skip
   non-existing shallow commits at fetch time.
 * Introduced a new, "quick prune" mode where prune_shallow() does not try
   to drop unreachable commits, but only non-existing ones.
 * Rebased to current master because there were too many merge conflicts for
   my liking otherwise.

Changes since v1:

 * Also trigger prune_shallow() when --unpack-unreachable=<approxidate> was
   passed to git repack.
 * No need to trigger prune_shallow() when git repack was called with -k.

Johannes Schindelin (3):
  repack: point out a bug handling stale shallow info
  shallow: offer to prune only non-existing entries
  repack -ad: prune the list of shallow commits

 builtin/prune.c          |  2 +-
 builtin/repack.c         |  6 ++++++
 commit.h                 |  4 +++-
 shallow.c                | 23 +++++++++++++++++------
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 5 files changed, 54 insertions(+), 8 deletions(-)


base-commit: c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-9%2Fdscho%2Fshallow-and-fetch-prune-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-9/dscho/shallow-and-fetch-prune-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/9

Range-diff vs v3:

 1:  ed8559b91 ! 1:  d9720bd25 repack: point out a bug handling stale shallow info
     @@ -29,7 +29,7 @@
      +	test_commit -C shallow-server C &&
      +	test_commit -C shallow-server E &&
      +	test_commit -C shallow-server D &&
     -+	d="$(git -C shallow-server rev-parse --verify D)" &&
     ++	d="$(git -C shallow-server rev-parse --verify D^0)" &&
      +	git -C shallow-server checkout master &&
      +
      +	git clone --depth=1 --no-tags --no-single-branch \
     @@ -43,7 +43,7 @@
      +	test_must_fail git -C shallow-client rev-parse --verify $d^0 &&
      +	! grep $d shallow-client/.git/shallow &&
      +
     -+	git -C shallow-server branch branch-orig D^0 &&
     ++	git -C shallow-server branch branch-orig $d &&
      +	git -C shallow-client fetch --prune --depth=2 \
      +		origin "+refs/heads/*:refs/remotes/origin/*"
      +'
 2:  f085eb4f7 ! 2:  18308c13e shallow: offer to prune only non-existing entries
     @@ -12,6 +12,10 @@
      
          Let's support that use case.
      
     +    Rather than extending the signature of `prune_shallow()` to accept
     +    another Boolean, let's turn it into a bit field and declare constants,
     +    for readability.
     +
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
      diff --git a/builtin/prune.c b/builtin/prune.c
     @@ -22,7 +26,7 @@
       
       	if (is_repository_shallow(the_repository))
      -		prune_shallow(show_only);
     -+		prune_shallow(show_only, 0);
     ++		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
       
       	return 0;
       }
     @@ -35,7 +39,9 @@
       					   int *ref_status);
       extern int delayed_reachability_test(struct shallow_info *si, int c);
      -extern void prune_shallow(int show_only);
     -+extern void prune_shallow(int show_only, int quick_prune);
     ++#define PRUNE_SHOW_ONLY 1
     ++#define PRUNE_QUICK 2
     ++extern void prune_shallow(unsigned options);
       extern struct trace_key trace_shallow;
       
       extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
     @@ -47,7 +53,7 @@
       
       #define SEEN_ONLY 1
       #define VERBOSE   2
     -+#define QUICK_PRUNE 4
     ++#define QUICK 4
       
       struct write_shallow_data {
       	struct strbuf *out;
     @@ -56,9 +62,8 @@
       	if (graft->nr_parent != -1)
       		return 0;
      -	if (data->flags & SEEN_ONLY) {
     -+	if (data->flags & QUICK_PRUNE) {
     -+		struct commit *c = lookup_commit(the_repository, &graft->oid);
     -+		if (!c || parse_commit(c))
     ++	if (data->flags & QUICK) {
     ++		if (!has_object_file(&graft->oid))
      +			return 0;
      +	} else if (data->flags & SEEN_ONLY) {
       		struct commit *c = lookup_commit(the_repository, &graft->oid);
     @@ -74,18 +79,19 @@
      + * commits that do not exist (any longer).
        */
      -void prune_shallow(int show_only)
     -+void prune_shallow(int show_only, int quick_prune)
     ++void prune_shallow(unsigned options)
       {
       	struct lock_file shallow_lock = LOCK_INIT;
       	struct strbuf sb = STRBUF_INIT;
      +	unsigned flags = SEEN_ONLY;
       	int fd;
       
     -+	if (quick_prune)
     -+		flags |= QUICK_PRUNE;
     -+
     - 	if (show_only) {
     +-	if (show_only) {
      -		write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY | VERBOSE);
     ++	if (options & PRUNE_QUICK)
     ++		flags |= QUICK;
     ++
     ++	if (options & PRUNE_SHOW_ONLY) {
      +		flags |= VERBOSE;
      +		write_shallow_commits_1(&sb, 0, NULL, flags);
       		strbuf_release(&sb);
 3:  1f9ff57d5 ! 3:  2858bc886 repack -ad: prune the list of shallow commits
     @@ -52,7 +52,7 @@
      +		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
      +		     unpack_unreachable) &&
      +		    is_repository_shallow(the_repository))
     -+			prune_shallow(0, 1);
     ++			prune_shallow(PRUNE_QUICK);
       	}
       
       	if (!no_update_server_info)

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4 1/3] repack: point out a bug handling stale shallow info
  2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
@ 2018-10-24 15:56       ` Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
  2 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-24 15:56 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

A `git fetch --prune` can turn previously-reachable objects unreachable,
even commits that are in the `shallow` list. A subsequent `git repack
-ad` will then unceremoniously drop those unreachable commits, and the
`shallow` list will become stale. This means that when we try to fetch
with a larger `--depth` the next time, we may end up with:

	fatal: error in object: unshallow <commit-hash>

Reported by Alejandro Pauly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5537-fetch-shallow.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 7045685e2..686fdc928 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,6 +186,33 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure '.git/shallow is edited by repack' '
+	git init shallow-server &&
+	test_commit -C shallow-server A &&
+	test_commit -C shallow-server B &&
+	git -C shallow-server checkout -b branch &&
+	test_commit -C shallow-server C &&
+	test_commit -C shallow-server E &&
+	test_commit -C shallow-server D &&
+	d="$(git -C shallow-server rev-parse --verify D^0)" &&
+	git -C shallow-server checkout master &&
+
+	git clone --depth=1 --no-tags --no-single-branch \
+		"file://$PWD/shallow-server" shallow-client &&
+
+	: now remove the branch and fetch with prune &&
+	git -C shallow-server branch -D branch &&
+	git -C shallow-client fetch --prune --depth=1 \
+		origin "+refs/heads/*:refs/remotes/origin/*" &&
+	git -C shallow-client repack -adfl &&
+	test_must_fail git -C shallow-client rev-parse --verify $d^0 &&
+	! grep $d shallow-client/.git/shallow &&
+
+	git -C shallow-server branch branch-orig $d &&
+	git -C shallow-client fetch --prune --depth=2 \
+		origin "+refs/heads/*:refs/remotes/origin/*"
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 2/3] shallow: offer to prune only non-existing entries
  2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
@ 2018-10-24 15:56       ` Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
  2 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-24 15:56 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `prune_shallow()` function wants a full reachability check to be
completed before it goes to work, to ensure that all unreachable entries
are removed from the shallow file.

However, in the upcoming patch we do not even want to go that far. We
really only need to remove entries corresponding to pruned commits, i.e.
to commits that no longer exist.

Let's support that use case.

Rather than extending the signature of `prune_shallow()` to accept
another Boolean, let's turn it into a bit field and declare constants,
for readability.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/prune.c |  2 +-
 commit.h        |  4 +++-
 shallow.c       | 23 +++++++++++++++++------
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 41230f821..1ec9ddd75 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	free(s);
 
 	if (is_repository_shallow(the_repository))
-		prune_shallow(show_only);
+		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
 
 	return 0;
 }
diff --git a/commit.h b/commit.h
index 1d260d62f..b80d6fdb5 100644
--- a/commit.h
+++ b/commit.h
@@ -249,7 +249,9 @@ extern void assign_shallow_commits_to_refs(struct shallow_info *info,
 					   uint32_t **used,
 					   int *ref_status);
 extern int delayed_reachability_test(struct shallow_info *si, int c);
-extern void prune_shallow(int show_only);
+#define PRUNE_SHOW_ONLY 1
+#define PRUNE_QUICK 2
+extern void prune_shallow(unsigned options);
 extern struct trace_key trace_shallow;
 
 extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
diff --git a/shallow.c b/shallow.c
index 732e18d54..02fdbfc55 100644
--- a/shallow.c
+++ b/shallow.c
@@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct repository *r)
 
 #define SEEN_ONLY 1
 #define VERBOSE   2
+#define QUICK 4
 
 struct write_shallow_data {
 	struct strbuf *out;
@@ -261,7 +262,10 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 	const char *hex = oid_to_hex(&graft->oid);
 	if (graft->nr_parent != -1)
 		return 0;
-	if (data->flags & SEEN_ONLY) {
+	if (data->flags & QUICK) {
+		if (!has_object_file(&graft->oid))
+			return 0;
+	} else if (data->flags & SEEN_ONLY) {
 		struct commit *c = lookup_commit(the_repository, &graft->oid);
 		if (!c || !(c->object.flags & SEEN)) {
 			if (data->flags & VERBOSE)
@@ -371,16 +375,23 @@ void advertise_shallow_grafts(int fd)
 
 /*
  * mark_reachable_objects() should have been run prior to this and all
- * reachable commits marked as "SEEN".
+ * reachable commits marked as "SEEN", except when quick_prune is non-zero,
+ * in which case lines are excised from the shallow file if they refer to
+ * commits that do not exist (any longer).
  */
-void prune_shallow(int show_only)
+void prune_shallow(unsigned options)
 {
 	struct lock_file shallow_lock = LOCK_INIT;
 	struct strbuf sb = STRBUF_INIT;
+	unsigned flags = SEEN_ONLY;
 	int fd;
 
-	if (show_only) {
-		write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY | VERBOSE);
+	if (options & PRUNE_QUICK)
+		flags |= QUICK;
+
+	if (options & PRUNE_SHOW_ONLY) {
+		flags |= VERBOSE;
+		write_shallow_commits_1(&sb, 0, NULL, flags);
 		strbuf_release(&sb);
 		return;
 	}
@@ -388,7 +399,7 @@ void prune_shallow(int show_only)
 				       git_path_shallow(the_repository),
 				       LOCK_DIE_ON_ERROR);
 	check_shallow_file_for_update(the_repository);
-	if (write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY)) {
+	if (write_shallow_commits_1(&sb, 0, NULL, flags)) {
 		if (write_in_full(fd, sb.buf, sb.len) < 0)
 			die_errno("failed to write to %s",
 				  get_lock_file_path(&shallow_lock));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 3/3] repack -ad: prune the list of shallow commits
  2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
  2018-10-24 15:56       ` [PATCH v4 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
@ 2018-10-24 15:56       ` Johannes Schindelin via GitGitGadget
  2 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2018-10-24 15:56 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Jeff King, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

`git repack` can drop unreachable commits without further warning,
making the corresponding entries in `.git/shallow` invalid, which causes
serious problems when deepening the branches.

One scenario where unreachable commits are dropped by `git repack` is
when a `git fetch --prune` (or even a `git fetch` when a ref was
force-pushed in the meantime) can make a commit unreachable that was
reachable before.

Therefore it is not safe to assume that a `git repack -adlf` will keep
unreachable commits alone (under the assumption that they had not been
packed in the first place, which is an assumption at least some of Git's
code seems to make).

This is particularly important to keep in mind when looking at the
`.git/shallow` file: if any commits listed in that file become
unreachable, it is not a problem, but if they go missing, it *is* a
problem. One symptom of this problem is that a deepening fetch may now
fail with

	fatal: error in object: unshallow <commit-hash>

To avoid this problem, let's prune the shallow list in `git repack` when
the `-d` option is passed, unless `-A` is passed, too (which would force
the now-unreachable objects to be turned into loose objects instead of
being deleted). Additionally, we also need to take `--keep-reachable`
and `--unpack-unreachable=<date>` into account.

Note: an alternative solution discussed during the review of this patch
was to teach `git fetch` to simply ignore entries in .git/shallow if the
corresponding commits do not exist locally. A quick test, however,
revealed that the .git/shallow file is written during a shallow *clone*,
in which case the commits do not exist, either, but the "shallow" line
*does* need to be sent. Therefore, this approach would be a lot more
finicky than the approach presented by the this patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/repack.c         | 6 ++++++
 t/t5537-fetch-shallow.sh | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index c6a7943d5..acd43c75d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -549,6 +549,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!po_args.quiet && isatty(2))
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
+
+		if (!keep_unreachable &&
+		    (!(pack_everything & LOOSEN_UNREACHABLE) ||
+		     unpack_unreachable) &&
+		    is_repository_shallow(the_repository))
+			prune_shallow(PRUNE_QUICK);
 	}
 
 	if (!no_update_server_info)
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 686fdc928..6faf17e17 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,7 +186,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure '.git/shallow is edited by repack' '
+test_expect_success '.git/shallow is edited by repack' '
 	git init shallow-server &&
 	test_commit -C shallow-server A &&
 	test_commit -C shallow-server B &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-24  8:01         ` Johannes Schindelin
@ 2018-10-24 15:56           ` Johannes Schindelin
  2018-10-25 18:54             ` Jonathan Tan
  0 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-24 15:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, Christian Couder, git, Duy Nguyen, Jeff King

Hi,

On Wed, 24 Oct 2018, Johannes Schindelin wrote:

> On Wed, 24 Oct 2018, Junio C Hamano wrote:
> 
> > Jonathan, do you see any issues with the use of lookup_commit() in
> > this change wrt lazy clone?  I am wondering what happens when the
> > commit in question is at, an immediate parent of, or an immediate
> > child of a promisor object.  I _think_ this change won't make it
> > worse for two features in playing together, but thought that it
> > would be better to double check.
> 
> Good point.
> 
> Instinctively, I would say that no promised object can be a shallow
> commit. The entire idea of a shallow commit is that it *is* present, but
> none of its parents.
> 
> Also, I would claim that the shallow feature does not make sense with lazy
> clones, as lazy clones offer a superset of shallow clone's functionality.
> 
> However, I am curious whether there is a better way to check for the
> presence of a local commit? Do we have an API function for that, that I
> missed? (I do not really want to parse the commit, after all, just verify
> that it is not pruned.)

Okay, I looked around a bit. It seems that there is an
`is_promisor_object(oid)` function in `pu` to see whether an object was
promised. If need be (and I am still not convinced that there is a need),
then we can always add a call to that function to the condition.

Coming back to my question whether there is a better way to check for the
presence of a local commit, I figured that I can use `has_object_file()`
instead of looking up and parsing the commit, as this code does not really
need to verify that the shallow entry refers to a commit, but only that it
refers to a local object.

So I'll send another iteration shortly, with this diff applied to 2/3:

-- snip --
diff --git a/shallow.c b/shallow.c
index 9c3faa8af243..02fdbfc554c4 100644
--- a/shallow.c
+++ b/shallow.c
@@ -263,8 +263,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 	if (graft->nr_parent != -1)
 		return 0;
 	if (data->flags & QUICK) {
-		struct commit *c = lookup_commit(the_repository, &graft->oid);
-		if (!c || parse_commit(c))
+		if (!has_object_file(&graft->oid))
 			return 0;
 	} else if (data->flags & SEEN_ONLY) {
 		struct commit *c = lookup_commit(the_repository, &graft->oid);
--snap --

Ciao,
Dscho

> 
> > 
> > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> > 
> > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > >
> > > The `prune_shallow()` function wants a full reachability check to be
> > > completed before it goes to work, to ensure that all unreachable entries
> > > are removed from the shallow file.
> > >
> > > However, in the upcoming patch we do not even want to go that far. We
> > > really only need to remove entries corresponding to pruned commits, i.e.
> > > to commits that no longer exist.
> > >
> > > Let's support that use case.
> > >
> > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > > ---
> > >  builtin/prune.c |  2 +-
> > >  commit.h        |  2 +-
> > >  shallow.c       | 22 +++++++++++++++++-----
> > >  3 files changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/builtin/prune.c b/builtin/prune.c
> > > index 41230f821..6d6ab6cf1 100644
> > > --- a/builtin/prune.c
> > > +++ b/builtin/prune.c
> > > @@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
> > >  	free(s);
> > >  
> > >  	if (is_repository_shallow(the_repository))
> > > -		prune_shallow(show_only);
> > > +		prune_shallow(show_only, 0);
> > >  
> > >  	return 0;
> > >  }
> > > diff --git a/commit.h b/commit.h
> > > index 1d260d62f..ff34447ab 100644
> > > --- a/commit.h
> > > +++ b/commit.h
> > > @@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct shallow_info *info,
> > >  					   uint32_t **used,
> > >  					   int *ref_status);
> > >  extern int delayed_reachability_test(struct shallow_info *si, int c);
> > > -extern void prune_shallow(int show_only);
> > > +extern void prune_shallow(int show_only, int quick_prune);
> > >  extern struct trace_key trace_shallow;
> > >  
> > >  extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
> > > diff --git a/shallow.c b/shallow.c
> > > index 732e18d54..0a2671bc2 100644
> > > --- a/shallow.c
> > > +++ b/shallow.c
> > > @@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct repository *r)
> > >  
> > >  #define SEEN_ONLY 1
> > >  #define VERBOSE   2
> > > +#define QUICK_PRUNE 4
> > >  
> > >  struct write_shallow_data {
> > >  	struct strbuf *out;
> > > @@ -261,7 +262,11 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
> > >  	const char *hex = oid_to_hex(&graft->oid);
> > >  	if (graft->nr_parent != -1)
> > >  		return 0;
> > > -	if (data->flags & SEEN_ONLY) {
> > > +	if (data->flags & QUICK_PRUNE) {
> > > +		struct commit *c = lookup_commit(the_repository, &graft->oid);
> > > +		if (!c || parse_commit(c))
> > > +			return 0;
> > > +	} else if (data->flags & SEEN_ONLY) {
> > >  		struct commit *c = lookup_commit(the_repository, &graft->oid);
> > >  		if (!c || !(c->object.flags & SEEN)) {
> > >  			if (data->flags & VERBOSE)
> > > @@ -371,16 +376,23 @@ void advertise_shallow_grafts(int fd)
> > >  
> > >  /*
> > >   * mark_reachable_objects() should have been run prior to this and all
> > > - * reachable commits marked as "SEEN".
> > > + * reachable commits marked as "SEEN", except when quick_prune is non-zero,
> > > + * in which case lines are excised from the shallow file if they refer to
> > > + * commits that do not exist (any longer).
> > >   */
> > > -void prune_shallow(int show_only)
> > > +void prune_shallow(int show_only, int quick_prune)
> > >  {
> > >  	struct lock_file shallow_lock = LOCK_INIT;
> > >  	struct strbuf sb = STRBUF_INIT;
> > > +	unsigned flags = SEEN_ONLY;
> > >  	int fd;
> > >  
> > > +	if (quick_prune)
> > > +		flags |= QUICK_PRUNE;
> > > +
> > >  	if (show_only) {
> > > -		write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY | VERBOSE);
> > > +		flags |= VERBOSE;
> > > +		write_shallow_commits_1(&sb, 0, NULL, flags);
> > >  		strbuf_release(&sb);
> > >  		return;
> > >  	}
> > > @@ -388,7 +400,7 @@ void prune_shallow(int show_only)
> > >  				       git_path_shallow(the_repository),
> > >  				       LOCK_DIE_ON_ERROR);
> > >  	check_shallow_file_for_update(the_repository);
> > > -	if (write_shallow_commits_1(&sb, 0, NULL, SEEN_ONLY)) {
> > > +	if (write_shallow_commits_1(&sb, 0, NULL, flags)) {
> > >  		if (write_in_full(fd, sb.buf, sb.len) < 0)
> > >  			die_errno("failed to write to %s",
> > >  				  get_lock_file_path(&shallow_lock));
> > 
> 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-24 15:56           ` Johannes Schindelin
@ 2018-10-25 18:54             ` Jonathan Tan
  2018-10-26  7:59               ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2018-10-25 18:54 UTC (permalink / raw)
  To: Johannes.Schindelin
  Cc: gitster, jonathantanmy, christian.couder, git, pclouds, peff

> On Wed, 24 Oct 2018, Johannes Schindelin wrote:
> 
> > On Wed, 24 Oct 2018, Junio C Hamano wrote:
> > 
> > > Jonathan, do you see any issues with the use of lookup_commit() in
> > > this change wrt lazy clone?  I am wondering what happens when the
> > > commit in question is at, an immediate parent of, or an immediate
> > > child of a promisor object.  I _think_ this change won't make it
> > > worse for two features in playing together, but thought that it
> > > would be better to double check.
> > 
> > Good point.
> > 
> > Instinctively, I would say that no promised object can be a shallow
> > commit. The entire idea of a shallow commit is that it *is* present, but
> > none of its parents.

I'm envisioning a client repo with a single branch, cloned both with
--depth=1 and with --filter=<foo>, that has just fetched to the same
branch also with --depth=1 resulting in a fast-forward from A to B.

If A is B's parent, then A would be known to be promised. (Incidentally,
the problem is greater in current Git, because for performance reasons,
we do not check promisor status when lazily fetching - so it doesn't
matter here whether an object is known to be promised or not.)

When pruning shallow and checking the existence of A, this would trigger
a fetch for A, which would download all commits and trees reachable from
it.

It sounds safer to me to use the fast approach in this patch when the
repository is not partial, and stick to the slow approach when it is.

> > However, I am curious whether there is a better way to check for the
> > presence of a local commit? Do we have an API function for that, that I
> > missed? (I do not really want to parse the commit, after all, just verify
> > that it is not pruned.)
> 
> Okay, I looked around a bit. It seems that there is an
> `is_promisor_object(oid)` function in `pu` to see whether an object was
> promised. If need be (and I am still not convinced that there is a need),
> then we can always add a call to that function to the condition.

I don't think there is a need for is_promisor_object() either - as
described above, it doesn't completely solve the problem.

> Coming back to my question whether there is a better way to check for the
> presence of a local commit, I figured that I can use `has_object_file()`
> instead of looking up and parsing the commit, as this code does not really
> need to verify that the shallow entry refers to a commit, but only that it
> refers to a local object.

Note that has_object_file() also triggers the lazy fetch if needed, but
I agree that it's better because you don't really need to parse the
commit.

There is the possibility of just checking for loose objects (which does
not trigger any lazy fetches), which works for builtin/prune since it
only prunes loose objects, but doesn't work in the general case, I
guess.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-25 18:54             ` Jonathan Tan
@ 2018-10-26  7:59               ` Johannes Schindelin
  2018-10-26 20:49                 ` Jonathan Tan
  0 siblings, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-26  7:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, christian.couder, git, pclouds, peff

Hi Jonathan,

On Thu, 25 Oct 2018, Jonathan Tan wrote:

> > On Wed, 24 Oct 2018, Johannes Schindelin wrote:
> > 
> > Coming back to my question whether there is a better way to check for
> > the presence of a local commit, I figured that I can use
> > `has_object_file()` instead of looking up and parsing the commit, as
> > this code does not really need to verify that the shallow entry refers
> > to a commit, but only that it refers to a local object.
> 
> Note that has_object_file() also triggers the lazy fetch if needed, but
> I agree that it's better because you don't really need to parse the
> commit.

Thanks for confirming.

So even better would be to use `is_promisor_object(oid) ||
has_object_file(oid)`, right?

This is something that is probably not even needed: as I mentioned, the
shallow commits are *expected* to be local. It should not ever happen that
they are fetched.

> There is the possibility of just checking for loose objects (which does
> not trigger any lazy fetches), which works for builtin/prune since it
> only prunes loose objects, but doesn't work in the general case, I
> guess.

In the test case I added, the commit object is actually packed. And then,
because it became unreachable, it is dropped.

If I only checked for loose objects here, the shallow line would already
be removed when the commit gets packed, which would be the wrong thing to
do.

In short: thank you for confirming that the current version of the patch
seems to be the best we can do for now.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-26  7:59               ` Johannes Schindelin
@ 2018-10-26 20:49                 ` Jonathan Tan
  2018-10-29 20:45                   ` Johannes Schindelin
  0 siblings, 1 reply; 47+ messages in thread
From: Jonathan Tan @ 2018-10-26 20:49 UTC (permalink / raw)
  To: Johannes.Schindelin
  Cc: jonathantanmy, gitster, christian.couder, git, pclouds, peff

> Thanks for confirming.
> 
> So even better would be to use `is_promisor_object(oid) ||
> has_object_file(oid)`, right?
> 
> This is something that is probably not even needed: as I mentioned, the
> shallow commits are *expected* to be local. It should not ever happen that
> they are fetched.

That would help, but I don't think it would help in the "fast-forward
from A to B where A is B's parent" case I describe in [1].

My suggestion was:

> It sounds safer to me to use the fast approach in this patch when the
> repository is not partial, and stick to the slow approach when it is.

which can be done by replacing "prune_shallow(0, 1)" in patch 3 with
"prune_shallow(0, !repository_format_partial_clone)", possibly with a comment
that the fast method checks object existence for each shallow line directly,
which is undesirable when the repository is a partial clone.
(repository_format_partial_clone is non-NULL with the name of the promisor
remote if the repository is a partial clone, and NULL otherwise).

[1] https://public-inbox.org/git/20181025185459.206127-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
  2018-10-26 20:49                 ` Jonathan Tan
@ 2018-10-29 20:45                   ` Johannes Schindelin
  0 siblings, 0 replies; 47+ messages in thread
From: Johannes Schindelin @ 2018-10-29 20:45 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, christian.couder, git, pclouds, peff

Hi Jonathan,

On Fri, 26 Oct 2018, Jonathan Tan wrote:

> > So even better would be to use `is_promisor_object(oid) ||
> > has_object_file(oid)`, right?
> > 
> > This is something that is probably not even needed: as I mentioned,
> > the shallow commits are *expected* to be local. It should not ever
> > happen that they are fetched.
> 
> That would help, but I don't think it would help in the "fast-forward
> from A to B where A is B's parent" case I describe in [1].

I don't think that that analysis is correct. It assumes that there could
be a promised commit that is also listed as shallow. I do not think that
is a possible scenario.

And even if it were, why would asking for a promised commit object
download not only that object but also all of its trees and all of its
ancestors? That's not how I understood the idea of the lazy clone: I
understood promised objects to be downloaded on demand, individually.

> My suggestion was:
> 
> > It sounds safer to me to use the fast approach in this patch when the
> > repository is not partial, and stick to the slow approach when it is.
> 
> which can be done by replacing "prune_shallow(0, 1)" in patch 3 with
> "prune_shallow(0, !repository_format_partial_clone)", possibly with a comment
> that the fast method checks object existence for each shallow line directly,
> which is undesirable when the repository is a partial clone.

I am afraid that that would re-introduce the bug pointed out by Peff: you
*really* would need to traverse all reachable commits to use the non-fast
pruning method. And we simply don't.

Ciao,
Dscho

> (repository_format_partial_clone is non-NULL with the name of the promisor
> remote if the repository is a partial clone, and NULL otherwise).
> 
> [1] https://public-inbox.org/git/20181025185459.206127-1-jonathantanmy@google.com/
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2018-10-29 20:45 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13 20:18 [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
2018-07-11 22:17 ` [PATCH 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-07-13 20:31   ` Jeff King
2018-07-14 21:56     ` Johannes Schindelin
2018-07-16 17:36       ` Jeff King
2018-07-17 16:25         ` Junio C Hamano
2018-07-19 16:42           ` Johannes Schindelin
2018-07-19 20:49             ` Junio C Hamano
2018-07-20  9:30               ` Junio C Hamano
2018-07-20 19:31                 ` Jeff King
2018-07-17 17:28         ` Duy Nguyen
2018-07-17 19:41           ` Jeff King
2018-07-18 17:31             ` Duy Nguyen
2018-07-18 17:45               ` Jeff King
2018-07-18 17:48                 ` Duy Nguyen
2018-07-17 16:39   ` Duy Nguyen
2018-07-17 16:48     ` Duy Nguyen
2018-07-19 17:50       ` Johannes Schindelin
2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
2018-07-17 13:51   ` [PATCH v2 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-07-17 13:51   ` [PATCH v2 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-07-17 17:45     ` Eric Sunshine
2018-07-17 19:15   ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Jeff King
2018-07-17 19:20     ` Jeff King
2018-07-19 17:48       ` Johannes Schindelin
2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-10-24  3:39       ` Junio C Hamano
2018-10-24  8:12         ` Johannes Schindelin
2018-10-24  8:38           ` Johannes Schindelin
2018-10-22 22:05     ` [PATCH v3 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
2018-10-24  3:47       ` Junio C Hamano
2018-10-24  8:01         ` Johannes Schindelin
2018-10-24 15:56           ` Johannes Schindelin
2018-10-25 18:54             ` Jonathan Tan
2018-10-26  7:59               ` Johannes Schindelin
2018-10-26 20:49                 ` Jonathan Tan
2018-10-29 20:45                   ` Johannes Schindelin
2018-10-22 22:05     ` [PATCH v3 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-10-24  3:56       ` Junio C Hamano
2018-10-24  8:02         ` Johannes Schindelin
2018-10-23 10:15     ` [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository Johannes Schindelin
2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).