git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	git <git@vger.kernel.org>, Colby Ranger <cranger@google.com>,
	David Barr <barr@github.com>
Subject: Re: Using bitmaps to accelerate fetch and clone
Date: Thu, 27 Sep 2012 16:18:09 -0400	[thread overview]
Message-ID: <20120927201809.GA11772@sigill.intra.peff.net> (raw)
In-Reply-To: <20120927185229.GD2519@sigill.intra.peff.net>

On Thu, Sep 27, 2012 at 02:52:29PM -0400, Jeff King wrote:

> > No. The pack file name is composed from the SHA-1 of the sorted SHA-1s
> > in the pack. Any change in compression settings or delta windows or
> > even just random scheduling variations when repacking can cause
> > offsets to slide, even if the set of objects being repacked has not
> > differed. The resulting pack and index will have the same file names
> > (as its the same set of objects), but the offset information and
> > ordering is now different.
> 
> Are you sure? The trailer is computed over the sha1 of the actual pack
> data (ordering, delta choices, and all), and is computed and written to
> the packfile via sha1close (see pack-objects.c, ll. 753-763). That
> trailer sha1 is fed into finish_tmp_packfile (l. 793).  That function
> feeds it to write_idx_file, which starts a new sha1 computation that
> includes the sorted sha1 list and other index info. But before we
> sha1close that computation, we write the _original_ trailer sha1, adding
> it to the new sha1 calculation. See pack-write.c, ll. 178-180.
> 
> And then that sha1 gets returned to finish_tmp_packfile, which uses it
> to name the resulting files.
> 
> Am I reading the code wrong?

And the answer is...yes. I'm blind.

The final bit of code in write_idx_file is:

        sha1write(f, sha1, 20);
        sha1close(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
                            ? CSUM_CLOSE : CSUM_FSYNC));
        git_SHA1_Final(sha1, &ctx);

So we write the trailer, but the sha1 we pull out is _not_ the sha1 over
the index format. It is from "ctx", not "f"; and hte former is from the
object list. Just like you said. :)

So yeah, we would want to put the pack trailer sha1 into the
supplementary index file, and check that it matches when we open it.
It's a slight annoyance, but it's O(1).

Anything which rewrote the pack and index would also want to rewrite
these supplementary files. So the worst case would be:

  1. Pack with a new version which builds the supplementary file.

  2. Repack with an old version which generates a pack with identical
     objects, but different ordering. It does not regenerate the
     supplementary file, because it does not know about it.

  3. Try to read with newer git.

Without the extra trailer check, we get a wrong answer. With the check,
we notice that the supplementary file is bogus, and fallback to the slow
path. Which I think is OK, considering that this is a reasonably
unlikely scenario to come up often (and it is no slower than it would be
if you generated a _new_ packfile in step 2).

-Peff

  reply	other threads:[~2012-09-27 20:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27  0:47 Using bitmaps to accelerate fetch and clone Shawn Pearce
2012-09-27 12:17 ` Nguyen Thai Ngoc Duy
2012-09-27 14:33   ` Shawn Pearce
2012-09-28  1:37     ` Nguyen Thai Ngoc Duy
2012-09-27 17:20   ` Jeff King
2012-09-27 17:35     ` Shawn Pearce
2012-09-27 18:22       ` Jeff King
2012-09-27 18:36         ` Shawn Pearce
2012-09-27 18:52           ` Jeff King
2012-09-27 20:18             ` Jeff King [this message]
2012-09-27 21:33               ` Junio C Hamano
2012-09-27 21:36                 ` Jeff King
2012-09-27 19:47     ` David Michael Barr
2012-09-28  1:38     ` Nguyen Thai Ngoc Duy
2012-09-28 12:00 ` Nguyen Thai Ngoc Duy
2012-10-01  1:07   ` Shawn Pearce
2012-10-01  1:59     ` Nguyen Thai Ngoc Duy
2012-10-01  2:26       ` Shawn Pearce
2012-10-01 12:48         ` Nguyen Thai Ngoc Duy
2012-10-02 15:00           ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120927201809.GA11772@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=barr@github.com \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).