git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>,
	Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH 4/6] introduce a commit metapack
Date: Sat, 02 Feb 2013 09:49:01 -0800	[thread overview]
Message-ID: <7vfw1ebo8i.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: 20130201094237.GE30644@sigill.intra.peff.net

Jeff King <peff@peff.net> writes:

> On Thu, Jan 31, 2013 at 09:03:26AM -0800, Shawn O. Pearce wrote:
> ...
>> If we are going to change the index to support extension sections and
>> I have to modify JGit to grok this new format, it needs to be index v3
>> not index v2. If we are making index v3 we should just put index v3 on
>> the end of the pack file.
>
> I'm not sure what you mean by your last sentence here.

I am not Shawn, but here is a summary of what I think I discussed
with him in person, lest I forget.

You could imagine that a new pack system (from pack-objects,
index-pack down to read_packed_sha1() call) that works with a
packfile that

 * is a single file, whose name is pack-$SHA1.$sfx (where $sfx
   is something other than 'pack', perhaps);

 * has the pack data stream, including the concluding checksum of
   the stream contents at the end, at the beginning of the file; and

 * has the index v3 data blob appended to the pack data stream.

The pack data is streamed over the wire exactly the same way,
interoperating with existing software.  When receiving, the new
index-pack can read such a pack stream and add index at the end.
When re-indexing an existing pack (think: upgrading existing
packfiles from the current system), the index-pack can read from the
packfile and do what it does currently (notably, it knows where the
pack stream ends and can stop reading at that point, so even if you
feed the new pack to it, it will stop at the end of the pack data,
ignoring the index v3 already at the end of the input).

One potential advantage of using a single file, instead of the
primary .pack file with 3 (or 47) auxiliary files, is that it lets
you repack without having to deal with this sequence, which happens
currently when you repack:

 * create a new .pack file and the corresponding auxiliary files
   under temporary filename;

 * move existing pack files that describe the same set of objects
   away;

 * rename these new files, one at a time, to their final name,
   making sure that you rename .idx the last, because that happens
   to be the key to the pack aware programs.

Instead you can rename only one thing (the new one) to the final
name (possibly atomically replacing the existing one).  With the
current system, when you need to replace a pack with a new pack with
the same packname (e.g. you repack everything with a better pack
parameter in a repository that has everything packed into one),
there is a very small window other concurrent users will not find
the object data between the time when you rename the old ones away
and the time when you move the new ones in.  The hairly logic
between "Ok we have prepared all new packfiles" and "End of pack
replacement" can be done with a single rename(2) of the new packfile
(which contains everything) to the final name, which atomically
replaces the old one.

This will become even safer if we picked $SHA1 (the name of the
packfile) to represent the hash of the whole thing, not the hash of
the sorted object names in the pack, as that will let us know there
is no need to even "replace" the files.

  reply	other threads:[~2013-02-02 17:49 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29  9:14 [PATCH/RFC 0/6] commit caching Jeff King
2013-01-29  9:15 ` [PATCH 1/6] csum-file: make sha1write const-correct Jeff King
2013-01-29  9:15 ` [PATCH 2/6] strbuf: add string-chomping functions Jeff King
2013-01-29 10:15   ` Michael Haggerty
2013-01-29 11:10     ` Jeff King
2013-01-30  5:00       ` Michael Haggerty
2013-01-29  9:15 ` [PATCH 3/6] introduce pack metadata cache files Jeff King
2013-01-29 17:35   ` Junio C Hamano
2013-01-30  6:47     ` Jeff King
2013-01-30  1:30   ` Duy Nguyen
2013-01-30  6:50     ` Jeff King
2013-01-29  9:16 ` [PATCH 4/6] introduce a commit metapack Jeff King
2013-01-29 10:24   ` Michael Haggerty
2013-01-29 11:13     ` Jeff King
2013-01-29 17:38   ` Junio C Hamano
2013-01-29 18:08     ` Junio C Hamano
2013-01-30  7:12       ` Jeff King
2013-01-30  7:17         ` Junio C Hamano
2013-02-01  9:21           ` Jeff King
2013-01-30 15:56         ` Junio C Hamano
2013-01-31 17:03           ` Shawn Pearce
2013-02-01  9:42             ` Jeff King
2013-02-02 17:49               ` Junio C Hamano [this message]
2013-01-30  7:07     ` Jeff King
2013-01-30  3:36   ` Duy Nguyen
2013-01-30  7:12     ` Jeff King
2013-01-30 13:56   ` Duy Nguyen
2013-01-30 14:16     ` Duy Nguyen
2013-01-31 11:06       ` Duy Nguyen
2013-02-01 10:15         ` Jeff King
2013-02-02  9:49           ` Duy Nguyen
2013-02-01 10:40         ` Jeff King
2013-03-17 13:21         ` Duy Nguyen
2013-03-18 12:20           ` Jeff King
2013-02-01 10:00     ` Jeff King
2013-01-29  9:16 ` [PATCH 5/6] add git-metapack command Jeff King
2013-01-29  9:16 ` [PATCH 6/6] commit: look up commit info in metapack Jeff King
2013-01-30  3:31 ` [PATCH/RFC 0/6] commit caching Duy Nguyen
2013-01-30  7:18   ` Jeff King
2013-01-30  8:32     ` Duy Nguyen
2013-01-31 17:14 ` Shawn Pearce
2013-02-01  9:11   ` Jeff King
2013-02-02 10:04     ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vfw1ebo8i.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).