git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Ævar Arnfjörð Bjarmason <avarab@gmail.com>,
	Junio C Hamano <gitster@pobox.com>, Eric Wong <e@80x24.org>,
	git@vger.kernel.org
Subject: Re: [PATCH v3] repack: enable bitmaps by default on bare repos
Date: Fri, 24 May 2019 06:33:00 -0400
Message-ID: <cbd7540b-0d36-e378-c5e7-871a70e47e18@gmail.com> (raw)
In-Reply-To: <20190524072426.GG25694@sigill.intra.peff.net>

On 5/24/2019 3:24 AM, Jeff King wrote:
> On Thu, May 23, 2019 at 08:53:56AM -0400, Derrick Stolee wrote:
> 
>>> I spent a while thinking and experimenting with this tonight. The result
>>> is the patch below. Ævar, do you still have a copy of the repo that
>>> misbehaved? I'd be curious to see how it fares.
>>
>> This patch caught my attention, and I think I understand some of the issues
>> at hand. I'm not well-versed specifically in Git's bitmap implementation.
>> The fundamental ideas are there, but my perspective is biased by my own
>> independent bitmap implementation for Azure Repos. What worked there may not
>> work at all here.
> 
> Thanks for looking at this. There are a lot of short-comings in the
> current bitmap implementation, so if there's a better way to do things,
> I'm not opposed to moving to a new format. :)
> 
>>> Finding the right trees to explore is a little tricky with bitmaps.  In
>>> a normal traversal, we consider the "edges" to be worth exploring.
>>> I.e., the places where an UNINTERESTING commit is the parent of an
>>> interesting one.
>>
>> This is the "commit frontier" which I bring up again below.
> 
> Right. I actually had trouble coming up with a succinct way of
> describing this, and stole the definition from your recent blog post. ;)
> 
>>> But with bitmaps, we don't have that information in the same way,
>>> because we're trying to avoid walking the commits in the first place. So
>>> e.g., in a history like this:
>>>
>>>   A--B--C
>>>       \
>>>        D
>>>
>>> Let's imagine we're computing "C..D", and that D has a bitmap on disk
>>> but C does not.
>>
>> (As I read your discussion below, I'm confused. For "C..D", C is a have
>> and D is a want. We should explore all the haves _first_, then walk the
>> wants, right?)
> 
> I think I may have confused things by starting my description with a
> hypothetical combined want/have walk. To take a step back: the problem
> we were discussing is that we spend a lot of time reading trees to fill
> in the "have" bitmap, even though most of those objects are unlikely to
> be in the "want" in the first place (only the frontier trees are really
> useful).

Thank you for resolving my confusion.

[snip]

> As I said earlier in the thread, I suspect our commit selection is not
> great. It's mostly some heuristics we threw together in 2013, and I
> don't think it was tested very thoroughly. So fixing that may be a
> simpler approach.

It's a hard problem! There are no _sure_ answers here, and what works in
some cases will probably not work in other cases.
 
> What I was wondering here was whether we could get an easy fix based on
> the same frontier ideas that the non-bitmap walk uses.

[snip]
 
> But doing that commit walk to find the frontier negates part of the
> purpose of using the bitmaps, which is avoiding even walking commits.
> Going back to a variant of our example:
> 
>   A -- B -- C_1 -- .. -- C_1000
>         \
> 	 D_1 -- .. -- D_1000
> 
> If we have a bitmap at C_1000 and D_1000, we don't have to walk any
> commits at all. But finding the frontier requires walking 2000 commits.

In my opinion, walking commits is easy (easier with the commit-graph)
and walking trees is hard. We probably _should_ do more commit walking if
it helps avoid walking some trees.
 
> Is there a way to find it with just bitmaps? You can certainly find the
> intersection, but you don't have any idea which of the many shared
> commits is the merge base. Of course in this example you don't actually
> care about the frontier (you have the full answer immediately). But how
> do you decide which way to optimize: try to avoid walking commits to
> get a quick answer from bitmaps, or prefer to walk some commits to find
> the frontier and avoid opening too many trees?

With a new perspective on the problem, I think perhaps trying to solve this
problem with bitmaps is incorrect. If you want to use bitmaps for C..D and
you don't have any bitmaps in the range D..C, then maybe we should just use
the old-fashioned method of walking trees? In your examples above, the
new method would walk trees for the commits in {B, C_i, D_j} while the
bitmap method would walk trees for the commits in {B, C_i, A_k}. I would
expect the list of {A_k} commits to be the largest in most cases.

The approach here would be to do the commit frontier walk, and check for
commits with bitmaps along the way. If we don't find an UNINTERESTING
bitmap, then use the non-bitmap way instead. Perhaps worth a shot.

I'll bring up this code snippet again:

        /*
         * if we have a HAVES list, but none of those haves is contained
         * in the packfile that has a bitmap, we don't have anything to
         * optimize here
         */
        if (haves && !in_bitmapped_pack(bitmap_git, haves))
                goto cleanup;

In addition to this, we can fill the "haves" set with the commits
in D..C (with boundary) and then check if any of those commits have a
precomputed bitmap. If not, goto cleanup.

Thanks,
-Stolee

  reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
2019-02-14 10:54   ` Eric Sunshine
2019-02-14 11:07     ` Jeff King
2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
2019-03-10 23:39     ` Jeff King
2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
2019-03-12 10:49         ` Jeff King
2019-03-12 12:05           ` Jeff King
2019-03-13  1:51           ` Eric Wong
2019-03-13 14:54             ` Jeff King
2019-03-14  9:12               ` [PATCH v3] " Eric Wong
2019-03-14 16:02                 ` Jeff King
2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
2019-03-15 13:25                       ` SZEDER Gábor
2019-03-15 18:36                         ` Jeff King
2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
2019-04-10 22:57                   ` Jeff King
2019-04-25  7:16                     ` Junio C Hamano
2019-05-04  1:37                       ` Jeff King
2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
2019-05-04 13:23                           ` SZEDER Gábor
2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
2019-05-09  4:24                               ` Junio C Hamano
2019-05-07  7:45                           ` Jeff King
2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
2019-05-08  7:11                               ` Jeff King
2019-05-08 14:20                                 ` Derrick Stolee
2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
2019-05-08 22:25                                   ` Jeff King
2019-05-23 11:30                     ` Jeff King
2019-05-23 12:53                       ` Derrick Stolee
2019-05-24  7:24                         ` Jeff King
2019-05-24 10:33                           ` Derrick Stolee [this message]
2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
2019-05-24  7:27                         ` Jeff King
2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
2019-05-24  8:26                             ` Jeff King
2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
2019-05-24  9:29                                 ` SZEDER Gábor
2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
2019-05-24 11:41                                     ` SZEDER Gábor
2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
2019-05-24 12:34                                         ` SZEDER Gábor
2019-05-24 13:41                                           ` Ævar Arnfjörð Bjarmason
2019-05-24 11:31                       ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee
2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
2019-04-18 19:49     ` Jeff King
2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
2019-04-20  1:01         ` Derrick Stolee
2019-04-20  3:24           ` Jeff King
2019-04-20 21:01             ` Derrick Stolee
2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbd7540b-0d36-e378-c5e7-871a70e47e18@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox