git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Using bitmaps to accelerate fetch and clone
@ 2012-09-27  0:47 Shawn Pearce
  2012-09-27 12:17 ` Nguyen Thai Ngoc Duy
  2012-09-28 12:00 ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 20+ messages in thread
From: Shawn Pearce @ 2012-09-27  0:47 UTC (permalink / raw)
  To: git; +Cc: Colby Ranger

Google has published a series of patches (see links below) to JGit to
improve fetch and clone performance by adding compressed bitmaps to
the pack-*.idx structure.

Operation                   Index V2               Index VE003
Clone                       37530ms (524.06 MiB)     82ms (524.06 MiB)
Fetch (1 commit back)          75ms                 107ms
Fetch (10 commits back)       456ms (269.51 KiB)    341ms (265.19 KiB)
Fetch (100 commits back)      449ms (269.91 KiB)    337ms (267.28 KiB)
Fetch (1000 commits back)    2229ms ( 14.75 MiB)    189ms ( 14.42 MiB)
Fetch (10000 commits back)   2177ms ( 16.30 MiB)    254ms ( 15.88 MiB)
Fetch (100000 commits back) 14340ms (185.83 MiB)   1655ms (189.39 MiB)

In the table the repository tested was Android's
platform/frameworks/base. The time shown is the time spent in the
"Counting objects" phase of creating a pack for a client using the
git:// protocol. The byte size shown is the size of the pack
transferred to the client, and "commits back" describes how far behind
the client was from the server when it started the fetch. In all test
runs the client was git-core and the server was JGit on the same
machine.

The amount of disk space used by the compressed bitmaps is tunable,
but averages 10-15% of the pack-*.idx file size. So about 8 MiB of
additional space for this repository. A repository owner can reduce
the worst case time used in the 100000 commit back case by using
slightly more disk and positioning more bitmaps more frequently
throughout history. The code doesn't do this by default because the
expectation is that a client is probably not 100k commits behind.
Instead it populates bitmaps at all branch and tag tips, and densely
(every few hundred commits) near the tips, and spaces them out more
the further back in history it goes. We assume the older history is
accessed less often, and doesn't need to waste additional disk space
or precious buffer cache.

The basic gist of the implementation is a bitmap has a 1 bit set for
each object that is reachable from the commit the bitmap is associated
with. An index file may have a unique bitmap for hundreds of commits
in the corresponding pack file. The set of objects to send is
performed by doing a simple computation:

  OR (all want lines) AND NOT OR (all have lines)

There are two key patches in the series that implement the file format
change and logic involved:

* https://git.eclipse.org/r/7939

  Defines the new E003 index format and the bit set
  implementation logic.

* https://git.eclipse.org/r/7940

  Uses E003 indexes when available to make packs, and
  the logic required to make E003 format indexes during GC.

:-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-10-02 15:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-27  0:47 Using bitmaps to accelerate fetch and clone Shawn Pearce
2012-09-27 12:17 ` Nguyen Thai Ngoc Duy
2012-09-27 14:33   ` Shawn Pearce
2012-09-28  1:37     ` Nguyen Thai Ngoc Duy
2012-09-27 17:20   ` Jeff King
2012-09-27 17:35     ` Shawn Pearce
2012-09-27 18:22       ` Jeff King
2012-09-27 18:36         ` Shawn Pearce
2012-09-27 18:52           ` Jeff King
2012-09-27 20:18             ` Jeff King
2012-09-27 21:33               ` Junio C Hamano
2012-09-27 21:36                 ` Jeff King
2012-09-27 19:47     ` David Michael Barr
2012-09-28  1:38     ` Nguyen Thai Ngoc Duy
2012-09-28 12:00 ` Nguyen Thai Ngoc Duy
2012-10-01  1:07   ` Shawn Pearce
2012-10-01  1:59     ` Nguyen Thai Ngoc Duy
2012-10-01  2:26       ` Shawn Pearce
2012-10-01 12:48         ` Nguyen Thai Ngoc Duy
2012-10-02 15:00           ` Shawn Pearce

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).