git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org
Subject: Re: bitmaps by default? [was: prune: use bitmaps for reachability traversal]
Date: Sun, 10 Mar 2019 19:39:57 -0400	[thread overview]
Message-ID: <20190310233956.GB3059@sigill.intra.peff.net> (raw)
In-Reply-To: <20190309024944.zcbwgvn52jsw2a2e@dcvr>

On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
> > Pruning generally has to traverse the whole commit graph in order to
> > see which objects are reachable. This is the exact problem that
> > reachability bitmaps were meant to solve, so let's use them (if they're
> > available, of course).
> 
> Perhaps this is good impetus for doing bitmaps by default?

I'm actually not sure it is, because the prune costs less than making
the bitmaps. Here are some timings on linux.git. This full-graph
traversal is roughly the same cost as the reachability walk that prune
would do internally:

	$ time git rev-list --objects --all >/dev/null
	real	0m47.714s
	user	0m47.113s
	sys	0m0.600s

Here's a normal noop repack as a baseline.

	$ time git repack -ad
	real	1m26.922s
	user	1m20.029s
	sys	0m7.878s

And here's another immediately after with bitmap generation. Generating
the bitmaps takes about 100s, compared to the 47s it would save us on
the prune.

	$ time git repack -adb
	real	3m5.915s
	user	2m59.377s
	sys	0m7.718s

Things are a little rosier if you generate the bitmaps a second time:

	$ time git repack -adb
	real	1m43.571s
	user	1m37.403s
	sys	0m8.179s

We can reuse some of the old bitmaps and it only takes 20 extra seconds,
making it a net win. But I'm not sure how realistic that is. There were
literally no new objects introduced between those two command. If this
were a "real" repack occurring after we'd accumulated a week or two
worth of objects, how long would it take?

A few other random observations:

  - I do suspect there are some real inefficiencies in the way we
    generate bitmaps. It _should_ be about as expensive as the graph
    traversal, but clearly it's not. I think this is because of the way
    the current bitmap code picks commits to bitmap, and then walks
    somewhat haphazardly over the history, trying to accumulate the set
    of objects for each commit. IOW, I believe it may sometimes
    traverse over some sequences of history more than once. So if we
    could make that faster, then the balance would shift in its favor.

  - This is comparing the cost of generating the bitmaps to the time
    saved for _one_ operation. On a repo serving many fetches, the cost
    to generate it is amortized over many requests. But for a normal
    end-user, that's not true (they'd presumably push out their work,
    but that usually only needs to walk a small bit of history anyway).

    The balance would change if we had more operations that used bitmaps
    (e.g., --contains can use them, as can ahead/behind checks). We
    don't do those things yet, but we could. However, those algorithms
    are also using other commit-graph optimizations, and we've discussed
    revamping the bitmap format as part of that work (one problem in
    particular is that to use the current bitmaps you have to parse the
    whole .bitmap file, making it sometimes a net negative to use the
    bitmaps). So I'd consider holding off any decision like "let's make
    this the default" until we see where that work goes.

> It would make life easier for people new to hosting git servers
> (and hopefully reduce centralization :)

I do think they're a net win for people hosting git servers. But if
that's the goal, I think at most you'd want to make bitmaps the default
for bare repos. They're really not much help for normal end-user repos
at this point.

> I started working on it, but t0410-partial-clone.sh fails with
> "Failed to write bitmap index. Packfile doesn't have full
> closure"; so more work needs to be done w.r.t. default behavior
> on partial clones...

Yeah, you can't use bitmaps at all in an incomplete clone. Shallow
clones would probably have the same issue (though in general we just
disable bitmaps entirely in shallow situations, so that might kick in).

-Peff

  reply	other threads:[~2019-03-10 23:40 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
2019-02-14 10:54   ` Eric Sunshine
2019-02-14 11:07     ` Jeff King
2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
2019-03-10 23:39     ` Jeff King [this message]
2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
2019-03-12 10:49         ` Jeff King
2019-03-12 12:05           ` Jeff King
2019-03-13  1:51           ` Eric Wong
2019-03-13 14:54             ` Jeff King
2019-03-14  9:12               ` [PATCH v3] " Eric Wong
2019-03-14 16:02                 ` Jeff King
2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
2019-03-15 13:25                       ` SZEDER Gábor
2019-03-15 18:36                         ` Jeff King
2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
2019-04-10 22:57                   ` Jeff King
2019-04-25  7:16                     ` Junio C Hamano
2019-05-04  1:37                       ` Jeff King
2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
2019-05-04 13:23                           ` SZEDER Gábor
2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
2019-05-09  4:24                               ` Junio C Hamano
2019-05-07  7:45                           ` Jeff King
2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
2019-05-08  7:11                               ` Jeff King
2019-05-08 14:20                                 ` Derrick Stolee
2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
2019-05-08 22:25                                   ` Jeff King
2019-05-23 11:30                     ` Jeff King
2019-05-23 12:53                       ` Derrick Stolee
2019-05-24  7:24                         ` Jeff King
2019-05-24 10:33                           ` Derrick Stolee
2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
2019-05-24  7:27                         ` Jeff King
2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
2019-05-24  8:26                             ` Jeff King
2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
2019-05-24  9:29                                 ` SZEDER Gábor
2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
2019-05-24 11:41                                     ` SZEDER Gábor
2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
2019-05-24 12:34                                         ` SZEDER Gábor
2019-05-24 13:41                                           ` Ævar Arnfjörð Bjarmason
2019-05-24 11:31                       ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee
2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
2019-04-18 19:49     ` Jeff King
2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
2019-04-20  1:01         ` Derrick Stolee
2019-04-20  3:24           ` Jeff King
2019-04-20 21:01             ` Derrick Stolee
2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190310233956.GB3059@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).