git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>,
	Colby Ranger <cranger@google.com>, David Barr <barr@github.com>
Subject: Re: Using bitmaps to accelerate fetch and clone
Date: Thu, 27 Sep 2012 13:20:37 -0400	[thread overview]
Message-ID: <20120927172037.GB1547@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8D0vkyEArNChXE0igUkanH6PwjmPitq22a9sudfmWF4kA@mail.gmail.com>

On Thu, Sep 27, 2012 at 07:17:42PM +0700, Nguyen Thai Ngoc Duy wrote:

> > Operation                   Index V2               Index VE003
> > Clone                       37530ms (524.06 MiB)     82ms (524.06 MiB)
> > Fetch (1 commit back)          75ms                 107ms
> > Fetch (10 commits back)       456ms (269.51 KiB)    341ms (265.19 KiB)
> > Fetch (100 commits back)      449ms (269.91 KiB)    337ms (267.28 KiB)
> > Fetch (1000 commits back)    2229ms ( 14.75 MiB)    189ms ( 14.42 MiB)
> > Fetch (10000 commits back)   2177ms ( 16.30 MiB)    254ms ( 15.88 MiB)
> > Fetch (100000 commits back) 14340ms (185.83 MiB)   1655ms (189.39 MiB)
> 
> Beautiful. And curious, why do 100->1000 and 10000->10000 have such
> big leaps in time (V2)?

Agreed. I'm very excited about these numbers.

> >   Defines the new E003 index format and the bit set
> >   implementation logic.
> [...]
> It seems the bitmap data follows directly after regular index content.
> I'd like to see some sort of extension mechanism like in
> $GIT_DIR/index, so that we don't have to increase pack index version
> often. What I have in mind is optional commit cache to speed up
> rev-list and merge, which could be stored in pack index too.

As I understand it, both the bitmaps and a commit cache are
theoretically optional. That is, git can do the job without them, but
they speed things up. If that is the case, do we need to bump the index
version at all? Why not store a plain v2 index, and then store an
additional file "pack-XXX.reachable" that contains the bitmaps and an
independent version number.

The sha1 in the filename makes sure that the reachability file is always
in sync with the actual pack data and index.  Old readers won't know
about the new file, and will ignore it. For new readers, if the file is
there they can use it; if it's missing (or its version is not
understood), they can fall back to the regular index.

I haven't looked at the details of the format change yet. If it is
purely an extra chunk of data at the end, this Just Works. If there are
changes to the earlier parts of the pack (e.g., I seem to recall the
commit cache idea wanted separate indices for each object type), we may
still need a v3. But it would be nice if we could make those changes
generic (e.g., just the separate indices, which might support many
different enhancements), and then let the actual feature work happen in
the separate files.

> Definitely :-). I have shown my interest in this topic before. So I
> should probably say that I'm going to work on this on C Git, but
> sllloooowwwly. As this benefits the server side greatly, perhaps a
> GitHubber ;-) might want to work on this on C Git, for GitHub itself
> of course, and, as a side effect, make the rest of us happy?

Yeah, GitHub is definitely interested in this. I may take a shot at it,
but I know David Barr (cc'd) is also interested in such things.

-Peff

  parent reply	other threads:[~2012-09-27 17:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27  0:47 Using bitmaps to accelerate fetch and clone Shawn Pearce
2012-09-27 12:17 ` Nguyen Thai Ngoc Duy
2012-09-27 14:33   ` Shawn Pearce
2012-09-28  1:37     ` Nguyen Thai Ngoc Duy
2012-09-27 17:20   ` Jeff King [this message]
2012-09-27 17:35     ` Shawn Pearce
2012-09-27 18:22       ` Jeff King
2012-09-27 18:36         ` Shawn Pearce
2012-09-27 18:52           ` Jeff King
2012-09-27 20:18             ` Jeff King
2012-09-27 21:33               ` Junio C Hamano
2012-09-27 21:36                 ` Jeff King
2012-09-27 19:47     ` David Michael Barr
2012-09-28  1:38     ` Nguyen Thai Ngoc Duy
2012-09-28 12:00 ` Nguyen Thai Ngoc Duy
2012-10-01  1:07   ` Shawn Pearce
2012-10-01  1:59     ` Nguyen Thai Ngoc Duy
2012-10-01  2:26       ` Shawn Pearce
2012-10-01 12:48         ` Nguyen Thai Ngoc Duy
2012-10-02 15:00           ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120927172037.GB1547@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=barr@github.com \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).