git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	gitgitgadget@gmail.com, Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] repack -ad: prune the list of shallow commits
Date: Tue, 17 Jul 2018 15:41:48 -0400	[thread overview]
Message-ID: <20180717194148.GC30594@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8BV3ii3MkYf6UObsX_JdDbT9ovY_K9dCCeYvQ3FWqnRUQ@mail.gmail.com>

On Tue, Jul 17, 2018 at 07:28:07PM +0200, Duy Nguyen wrote:

> On Mon, Jul 16, 2018 at 7:38 PM Jeff King <peff@peff.net> wrote:
> > in a user-visible way. And that's really my question: is pruning here
> > going to bite people unexpectedly (not rhetorical -- I really don't
> > know)?
> 
> If shallow points are no longer reachable, removing them should not
> bite anybody, I think.

I slept on this to see if I could brainstorm any other ways.

The only thing I really came up with is that removing shallows is racy
with respect to the reachability check.  For instance, if "prune" runs
at the same as an incoming shallow fetch, we'll compute the reachability
without holding the shallow lock. Somebody else may write an entry for
an object we've never heard of (because it just showed up), and we'd
erase it, making the repository appear corrupt.

But note that I used "prune" in the above example, because this bug
already exists. So probably we're not changing anything material here.
Though if we wanted to fix it, I think it would require holding the
shallow lock during the reachability analysis, which is probably not
something that repack would want to do. Unlike prune, it then does
a lot of other expensive operations, like delta compression and writing
out the pack, before it would get to the shallow prune.

Although even holding the lock for the duration of "prune" is more than
we need. Shallow commits must be commits, so we don't need to do a full
graph walk (and in my experience there's often an order-of-magnitude
difference between the two; even more so if we're using Stolee's
commit-graph cache).

> > I think in the case of git-prune and prune_shallow(), it's a little
> > tricky, though. It looks like prune_shallow() relies on somebody having
> > marked the SEEN flag, which implies doing a full reachability walk.
> 
> Sorry, my bad for hiding this SEEN flag deep in library code like this
> in eab3296c7e (prune: clean .git/shallow after pruning objects -
> 2013-12-05) But in my defense I didn't realize the horror of sharing
> object flags a year later and added the "object flag allocation" in
> object.h

Sort of an aside to the patch under discussion, but I think it may make
sense for prune_shallow() to take a callback function for determining
whether a commit is reachable.

I have an old patch that teaches git-prune to lazily do the reachability
check, since in many cases "git repack" will have just packed all of the
loose objects. But it just occurred to me that this patch is totally
broken with respect to prune_shallow(), because it would not set the
SEEN flag (I've literally been running with it for years, which goes to
show how often I use the shallow feature).

And if we were to have repack do a prune_shallow(), it may want to use a
different method than traversing and marking each object SEEN.

> > So either we have to do a reachability walk (which is expensive), or we
> > have to rely on some other command (like prune) that is doing a
> > reachability walk and reuse that work.
> 
> Since "git prune" took care of loose objects, if we're going to delete
> some redundant packs, we can perform a reasonably cheap "reachability"
> test in repack, I think. We have the list of new packs from
> pack-objects which should contain all reachable objects (and even some
> unreachable ones). If we traverse all of their idx files and mark as
> SEEN, then whatever shallow points that are not marked SEEN _and_ not
> loose objects could be safely removed.

Hmm. I don't immediately see any reason that would not work with the
current code. But I am a little uncomfortable adding these sorts of
subtle dependencies and assumptions. It feels like we're building a
house of cards.

> I don't see any problem with this either, but then I've not worked on
> shallow code for quite some time. The only potential problem is where
> we do this check. If we check (and drop) shallow points when we read
> .git/info/shallow, then when we write it down some time in the future
> we accidentally prune them. Which might be ok but I feel safer when we
> only prune at one place (prune/gc/repack) where we can be more careful
> and can do more testing.
> 
> If we read the shallow list as-is, then just not send "shallow" lines
> in fetch-pack for shallow points that do not exist, I don't see a
> problem, but it may be a bit more work and we could get to some
> confusing state where upload-pack gives us the same shallow point that
> we have but ignore because we don't have objects behind it. Hm...
> actually I might see a problem :)

Yeah, I agree if we were to do this it would definitely be in a
"read-only" way: let the current command skip those grafts for its
operation, but do not impact the on-disk shallow file. Then races or
other assumptions can at worst impact that operation, and not cause a
lasting corruption (and in particular I think we'd be subject to the
race I described at the start of this mail).

-Peff

  reply	other threads:[~2018-07-17 19:41 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-13 20:18 [PATCH 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
2018-07-11 22:17 ` [PATCH 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-07-11 22:23 ` [PATCH 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-07-13 20:31   ` Jeff King
2018-07-14 21:56     ` Johannes Schindelin
2018-07-16 17:36       ` Jeff King
2018-07-17 16:25         ` Junio C Hamano
2018-07-19 16:42           ` Johannes Schindelin
2018-07-19 20:49             ` Junio C Hamano
2018-07-20  9:30               ` Junio C Hamano
2018-07-20 19:31                 ` Jeff King
2018-07-17 17:28         ` Duy Nguyen
2018-07-17 19:41           ` Jeff King [this message]
2018-07-18 17:31             ` Duy Nguyen
2018-07-18 17:45               ` Jeff King
2018-07-18 17:48                 ` Duy Nguyen
2018-07-17 16:39   ` Duy Nguyen
2018-07-17 16:48     ` Duy Nguyen
2018-07-19 17:50       ` Johannes Schindelin
2018-07-17 13:51 ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Johannes Schindelin via GitGitGadget
2018-07-17 13:51   ` [PATCH v2 1/2] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-07-17 13:51   ` [PATCH v2 2/2] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-07-17 17:45     ` Eric Sunshine
2018-07-17 19:15   ` [PATCH v2 0/2] repack -ad: fix after `fetch --prune` in a shallow repository Jeff King
2018-07-17 19:20     ` Jeff King
2018-07-19 17:48       ` Johannes Schindelin
2018-10-22 22:05   ` [PATCH v3 0/3] repack -ad: fix after fetch --prune " Johannes Schindelin via GitGitGadget
2018-10-22 22:05     ` [PATCH v3 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-10-24  3:39       ` Junio C Hamano
2018-10-24  8:12         ` Johannes Schindelin
2018-10-24  8:38           ` Johannes Schindelin
2018-10-22 22:05     ` [PATCH v3 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
2018-10-24  3:47       ` Junio C Hamano
2018-10-24  8:01         ` Johannes Schindelin
2018-10-24 15:56           ` Johannes Schindelin
2018-10-25 18:54             ` Jonathan Tan
2018-10-26  7:59               ` Johannes Schindelin
2018-10-26 20:49                 ` Jonathan Tan
2018-10-29 20:45                   ` Johannes Schindelin
2018-10-22 22:05     ` [PATCH v3 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget
2018-10-24  3:56       ` Junio C Hamano
2018-10-24  8:02         ` Johannes Schindelin
2018-10-23 10:15     ` [PATCH v3 0/3] repack -ad: fix after fetch --prune in a shallow repository Johannes Schindelin
2018-10-24 15:56     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 1/3] repack: point out a bug handling stale shallow info Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 2/3] shallow: offer to prune only non-existing entries Johannes Schindelin via GitGitGadget
2018-10-24 15:56       ` [PATCH v4 3/3] repack -ad: prune the list of shallow commits Johannes Schindelin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180717194148.GC30594@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).