git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	gitgitgadget@gmail.com, Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
Date: Tue, 17 Jul 2018 19:28:07 +0200	[thread overview]
Message-ID: <CACsJy8BV3ii3MkYf6UObsX_JdDbT9ovY_K9dCCeYvQ3FWqnRUQ@mail.gmail.com> (raw)
In-Reply-To: <20180716173636.GA18636@sigill.intra.peff.net>

On Mon, Jul 16, 2018 at 7:38 PM Jeff King <peff@peff.net> wrote:
> in a user-visible way. And that's really my question: is pruning here
> going to bite people unexpectedly (not rhetorical -- I really don't
> know)?

If shallow points are no longer reachable, removing them should not
bite anybody, I think.

> For instance, at GitHub we do not ever run "git gc", but have our
> maintenance regimen that builds around "git repack". I don't think this
> patch will matter to us either way, because we would never have a
> shallow repository in the first place. But I'm wondering if people in a
> similar situation might.
>
> I'm also not entirely sure if people _care_ if their shallow list is
> pruned. Maybe it's not a big deal, and should just be quietly cleaned
> up.
>
> And I know that you're probably coming at it from the opposite angle.
> Somebody isn't running git-gc but doing a "repack -adl" and they _do_
> want these shallow files cleaned up (because their repo is broken after
> dropping the objects). I just want to make sure we don't regress some
> other case.
>
> > > I.e., it seems unexpected that "git repack" is going to tweak your
> > > shallow lists. If we were designing from scratch, the sane behavior
> > > seems to me to be:
> > >
> > >   1. Shallow pruning should be its own separate command (distinct from
> > >      either repacking or loose object pruning), and should be triggered
> > >      as part of a normal git-gc.
> >
> > I fail to see the value in a separate command.
>
> The value of a separate command is that you can run those other commands
> without triggering this behavior. I actually think "git prune" does too
> much already (e.g., imagine that I do not ever want to prune objects,
> but I _do_ want to get rid of tmp_pack and tmp_obj cruft; what command
> do I run?). But that is definitely not a new problem. And I'm OK with
> not fixing it for now. My main concern is that we are using the presence
> of that mistake to justify making it again.
>
> > And: `git gc` already calls `git prune`, which *does* prune the shallow
> > list.
>
> Right, I was trying to propose that each individual cleanup step which
> _could_ be done independently should be done so, but that "gc" should
> continue to do them all.
>
> I think in the case of git-prune and prune_shallow(), it's a little
> tricky, though. It looks like prune_shallow() relies on somebody having
> marked the SEEN flag, which implies doing a full reachability walk.

Sorry, my bad for hiding this SEEN flag deep in library code like this
in eab3296c7e (prune: clean .git/shallow after pruning objects -
2013-12-05) But in my defense I didn't realize the horror of sharing
object flags a year later and added the "object flag allocation" in
object.h

> So it's really amortizing the work already being done by prune.
>
> Speaking of which, I don't see how your patch can work as-is. The repack
> command does not do such a walk, so it has no idea which commits are
> SEEN or not, and will delete all of them! Try this:
>
>   [shallow of clone of git.git (or any repo)]
>   $ git clone --no-local --depth=1 /path/to/git tmp
>   ...
>   $ cd tmp
>
>   [we have a commit]
>   $ git log --oneline -1
>   de46fca (grafted, HEAD) repack -ad: prune the list of shallow commits
>
>   [repacking with existing git is fine]
>   $ git repack -adl
>   ...
>   $ git log --oneline -1
>   de46fca (grafted, HEAD) repack -ad: prune the list of shallow commits
>
>   [repacking with your patch empties the shallow file, which
>    makes the repository look corrupt]
>   $ /path/to/git repack -adl
>   $ git log --oneline -1
>   error: Could not read a2df50675979af639cf9490f7fc9b86fa18fd907
>   fatal: Failed to traverse parents of commit de46fca5b43f47f3c5c6f9232a17490d39fc80b1
>
> So either we have to do a reachability walk (which is expensive), or we
> have to rely on some other command (like prune) that is doing a
> reachability walk and reuse that work.

Since "git prune" took care of loose objects, if we're going to delete
some redundant packs, we can perform a reasonably cheap "reachability"
test in repack, I think. We have the list of new packs from
pack-objects which should contain all reachable objects (and even some
unreachable ones). If we traverse all of their idx files and mark as
SEEN, then whatever shallow points that are not marked SEEN _and_ not
loose objects could be safely removed.

> > >   2b. It's OK for shallow objects to be missing, and the shallow code
> > >       should be more resilient to missing objects. Neither repack nor
> > >       prune needs to know or care.
> >
> > That would be possible. Kind of like saying: we do have this list of
> > shallow commits, but oh, BTW, it is likely incorrect, so we painstakingly
> > verify for every entry during every fetch and push that those commits
> > objects are still there.
>
> Obviously verifying reachability on each fetch is a bad idea. But my
> understanding of the shallow list is that it says "this is a graft point
> where we do not have any of the parents". If we find that we do not have
> such an object, would it be OK to quietly ignore that? We used to claim
> "we do not have the parents of X", but now we know "well, we do not have
> X anymore either".
>
> Again, I may be showing my ignorance of the shallow code here.

I don't see any problem with this either, but then I've not worked on
shallow code for quite some time. The only potential problem is where
we do this check. If we check (and drop) shallow points when we read
.git/info/shallow, then when we write it down some time in the future
we accidentally prune them. Which might be ok but I feel safer when we
only prune at one place (prune/gc/repack) where we can be more careful
and can do more testing.

If we read the shallow list as-is, then just not send "shallow" lines
in fetch-pack for shallow points that do not exist, I don't see a
problem, but it may be a bit more work and we could get to some
confusing state where upload-pack gives us the same shallow point that
we have but ignore because we don't have objects behind it. Hm...
actually I might see a problem :)
-- 
Duy

  parent reply	other threads:[~2018-07-17 17:28 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-13 20:18 [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
2018-07-11 22:17 ` [PATCH 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-07-13 20:31   ` Jeff King
2018-07-14 21:56     ` Johannes Schindelin
2018-07-16 17:36       ` Jeff King
2018-07-17 16:25         ` Junio C Hamano
2018-07-19 16:42           ` Johannes Schindelin
2018-07-19 20:49             ` Junio C Hamano
2018-07-20  9:30               ` Junio C Hamano
2018-07-20 19:31                 ` Jeff King
2018-07-17 17:28         ` Duy Nguyen [this message]
2018-07-17 19:41           ` Jeff King
2018-07-18 17:31             ` Duy Nguyen
2018-07-18 17:45               ` Jeff King
2018-07-18 17:48                 ` Duy Nguyen
2018-07-17 16:39   ` Duy Nguyen
2018-07-17 16:48     ` Duy Nguyen
2018-07-19 17:50       ` Johannes Schindelin
2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
2018-07-17 13:51   ` [PATCH v2 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-07-17 13:51   ` [PATCH v2 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-07-17 17:45     ` Eric Sunshine
2018-07-17 19:15   ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Jeff King
2018-07-17 19:20     ` Jeff King
2018-07-19 17:48       ` Johannes Schindelin
2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-10-24  3:39       ` Junio C Hamano
2018-10-24  8:12         ` Johannes Schindelin
2018-10-24  8:38           ` Johannes Schindelin
2018-10-22 22:05     ` [PATCH v3 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
2018-10-24  3:47       ` Junio C Hamano
2018-10-24  8:01         ` Johannes Schindelin
2018-10-24 15:56           ` Johannes Schindelin
2018-10-25 18:54             ` Jonathan Tan
2018-10-26  7:59               ` Johannes Schindelin
2018-10-26 20:49                 ` Jonathan Tan
2018-10-29 20:45                   ` Johannes Schindelin
2018-10-22 22:05     ` [PATCH v3 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-10-24  3:56       ` Junio C Hamano
2018-10-24  8:02         ` Johannes Schindelin
2018-10-23 10:15     ` [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository Johannes Schindelin
2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8BV3ii3MkYf6UObsX_JdDbT9ovY_K9dCCeYvQ3FWqnRUQ@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).