git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: repo-discuss@googlegroups.com, jmelvin@codeaurora.org,
	jgit-dev@eclipse.org, git@vger.kernel.org
Subject: Re: Preserve/Prune Old Pack Files
Date: Mon, 9 Jan 2017 01:21:37 -0500	[thread overview]
Message-ID: <20170109062137.zghmurndlbts5x44@sigill.intra.peff.net> (raw)
In-Reply-To: <5172470.bsscxDU4yv@mfick1-lnx>

On Wed, Jan 04, 2017 at 09:11:55AM -0700, Martin Fick wrote:

> I am replying to this email across lists because I wanted to 
> highlight to the git community this jgit change to repacking 
> that we have up for review
> 
>  https://git.eclipse.org/r/#/c/87969/
> 
> This change introduces a new convention for how to preserve 
> old pack files in a staging area 
> (.git/objects/packs/preserved) before deleting them.  I 
> wanted to ensure that the new proposed convention would be 
> done in a way that would be satisfactory to the git 
> community as a whole so that it would be more easy to 
> provide the same behavior in git eventually.  The preserved 
> pack files (and accompanying index and bitmap files), are not 
> only moved, but they are also renamed so that they no longer 
> will match recursive finds looking for pack files.

It looks like objects/pack/pack-123.pack becomes
objects/pack/preserved/pack-123.old-pack, and so forth.
Which seems reasonable, and I'm happy that:

  find objects/pack -name '*.pack'

would not find it. :)

I suspect the name-change will break a few tools that you might want to
use to look at a preserved pack (like verify-pack). I know that's not
your primary use case, but it seems plausible that somebody may one day
want to use a preserved pack to try to recover from corruption. I think
"git index-pack --stdin <objects/packs/preserved/pack-123.old-pack"
could always be a last-resort for re-admitting the objects to the
repository.

I notice this doesn't do anything for loose objects. I think they
technically suffer the same issue, though the race window is much
shorter (we mmap them and zlib inflate immediately, whereas packfiles
may stay mapped across many object requests).

I have one other thought that's tangentially related.

I've wondered if we could make object pruning more atomic by
speculatively moving items to be deleted into some kind of "outgoing"
object area. Right now you can have a case like:

  0. We have a pack that has commit X, which is reachable, and commit Y,
     which is not.

  1. Process A is repacking. It walks the object graph and finds that X
     is reachable. It begins creating a new pack with X and its
     dependent objects.

  2. Meanwhile, process B pushes up a merge of X and Y, and updates a
     ref to point to it.

  3. Process A finishes writing the new pack, and deletes the old one,
     removing Y. The repository is now corrupt.

I don't have a solution here.  I don't think we want to solve it by
locking the repository for updates during a repack. I have a vague sense
that a solution could be crafted around moving the old pack into a
holding area instead of deleting (during which time nobody else would
see the objects, and thus not reference them), while the repacking
process checks to see if the actual deletion would break any references
(and rolls back the deletion if it would).

That's _way_ more complicated than your problem, and as I said, I do not
have a finished solution. But it seems like they touch on a similar
concept (a post-delete holding area for objects). So I thought I'd
mention it in case if spurs any brilliance.

-Peff

  reply	other threads:[~2017-01-09  6:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <24abd0ed58c25ce832014f9bd5bb2090@codeaurora.org>
2017-01-04 16:11 ` Preserve/Prune Old Pack Files Martin Fick
2017-01-09  6:21   ` Jeff King [this message]
2017-01-09  7:01     ` Mike Hommey
2017-01-09 10:55       ` Jeff King
2017-01-09 16:20         ` Martin Fick
2017-01-09 16:17     ` Martin Fick
2017-01-10  9:14       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170109062137.zghmurndlbts5x44@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jgit-dev@eclipse.org \
    --cc=jmelvin@codeaurora.org \
    --cc=mfick@codeaurora.org \
    --cc=repo-discuss@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).