From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/15] combining object filters and bitmaps
Date: Fri, 14 Feb 2020 13:21:47 -0500 [thread overview]
Message-ID: <20200214182147.GA654525@coredump.intra.peff.net> (raw)
In-Reply-To: <20200213021506.GA1124607@coredump.intra.peff.net>
Here's a v2 based on the early feedback from Junio, and a few things I
noticed while addressing it:
- patch 4 now moves the check for revs.prune into the pack-bitmap code,
rather than rearranging it within rev-list.
- I noticed some more opportunities to reuse the "compare the results
of a bitmap and non-bitmap traversal" code, so I factored that out
into its own commit (patch 8)
- I realized another long-standing annoyance is simple to fix: before
we did not know how to do "rev-list --use-bitmap-index" without
"--objects", but it was pretty easy to support. That's patch 9.
The range diff below is a little noisy because of the ripple effects of
some of those changes. If you just want to review the substantive
changes, look at patches 4, 8, and 9.
[01/15]: pack-bitmap: factor out type iterator initialization
[02/15]: pack-bitmap: fix leak of haves/wants object lists
[03/15]: rev-list: fallback to non-bitmap traversal when filtering
[04/15]: pack-bitmap: refuse to do a bitmap traversal with pathspecs
[05/15]: rev-list: factor out bitmap-optimized routines
[06/15]: rev-list: make --count work with --objects
[07/15]: rev-list: allow bitmaps when counting objects
[08/15]: t5310: factor out bitmap traversal comparison
[09/15]: rev-list: allow commit-only bitmap traversals
[10/15]: pack-bitmap: basic noop bitmap filter infrastructure
[11/15]: rev-list: use bitmap filters for traversal
[12/15]: bitmap: add bitmap_unset() function
[13/15]: pack-bitmap: implement BLOB_NONE filtering
[14/15]: pack-bitmap: implement BLOB_LIMIT filtering
[15/15]: pack-objects: support filters with bitmaps
builtin/pack-objects.c | 6 +-
builtin/rev-list.c | 114 +++++++++---
ewah/bitmap.c | 8 +
ewah/ewok.h | 1 +
object.c | 9 +
object.h | 2 +
pack-bitmap.c | 272 +++++++++++++++++++++++++----
pack-bitmap.h | 5 +-
reachable.c | 4 +-
t/perf/p5310-pack-bitmaps.sh | 22 +++
t/t5310-pack-bitmaps.sh | 36 +++-
t/t6000-rev-list-misc.sh | 12 ++
t/t6113-rev-list-bitmap-filters.sh | 56 ++++++
t/test-lib-functions.sh | 27 +++
14 files changed, 504 insertions(+), 70 deletions(-)
create mode 100755 t/t6113-rev-list-bitmap-filters.sh
Range-diff from v1:
1: 283874f2ac = 1: 4424d745de pack-bitmap: factor out type iterator initialization
2: 68e7bed41c = 2: bc0576868f pack-bitmap: fix leak of haves/wants object lists
3: 1182593fb2 = 3: 9d2ef09b9b rev-list: fallback to non-bitmap traversal when filtering
4: 03b20e9b66 < -: ---------- rev-list: consolidate bitmap-disabling options
-: ---------- > 4: ce7969221a pack-bitmap: refuse to do a bitmap traversal with pathspecs
5: 91032b0d78 = 5: a9a5513bb0 rev-list: factor out bitmap-optimized routines
6: c7c3a31cf3 ! 6: eab5dd34bc rev-list: make --count work with --objects
@@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
+ (revs.left_right || revs.cherry_mark))
+ die(_("marked counting is incompatible with --objects"));
+
- if (filter_options.choice || revs.prune)
+ if (filter_options.choice)
use_bitmap_index = 0;
7: 0aabf6262b = 7: 04af6b9362 rev-list: allow bitmaps when counting objects
-: ---------- > 8: 4b93d064b1 t5310: factor out bitmap traversal comparison
-: ---------- > 9: d89509fa8f rev-list: allow commit-only bitmap traversals
8: 969238ec42 ! 10: c64d4db8e8 pack-bitmap: basic noop bitmap filter infrastructure
@@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs)
return -1;
@@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs)
- if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
+ if (revs->max_count >= 0)
return -1;
- bitmap_git = prepare_bitmap_walk(revs);
@@ pack-bitmap.c: static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
unsigned int i;
@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
- struct bitmap *wants_bitmap = NULL;
- struct bitmap *haves_bitmap = NULL;
+ if (revs->prune)
+ return NULL;
-- struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
-+ struct bitmap_index *bitmap_git;
-+
+ if (!can_filter_bitmap(filter))
+ return NULL;
+
/* try to open a bitmapped pack, but don't parse it yet
* because we may not need to use it */
-+ bitmap_git = xcalloc(1, sizeof(*bitmap_git));
- if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
- goto cleanup;
-
+ bitmap_git = xcalloc(1, sizeof(*bitmap_git));
@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
if (haves_bitmap)
bitmap_and_not(wants_bitmap, haves_bitmap);
@@ pack-bitmap.h
static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
-@@ pack-bitmap.h: void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
- void traverse_bitmap_commit_list(struct bitmap_index *,
+@@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
+ struct rev_info *revs,
show_reachable_fn show_reachable);
void test_bitmap_walk(struct rev_info *revs);
-struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
@@ reachable.c: void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
- bitmap_git = prepare_bitmap_walk(revs);
+ bitmap_git = prepare_bitmap_walk(revs, NULL);
if (bitmap_git) {
- traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+ traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
free_bitmap_index(bitmap_git);
9: b4eb8fcb9d ! 11: caa69ac535 rev-list: use bitmap filters for traversal
@@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs)
struct bitmap_index *bitmap_git;
@@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs)
- if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
+ if (revs->max_count >= 0)
return -1;
- bitmap_git = prepare_bitmap_walk(revs, NULL);
@@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
(revs.left_right || revs.cherry_mark))
die(_("marked counting is incompatible with --objects"));
-- if (filter_options.choice || revs.prune)
-+ if (revs.prune)
- use_bitmap_index = 0;
-
+- if (filter_options.choice)
+- use_bitmap_index = 0;
+-
save_commit_buffer = (revs.verbose_header ||
+ revs.grep_filter.pattern_list ||
+ revs.grep_filter.header_list);
@@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
progress = start_delayed_progress(show_progress, 0);
10: b518070c12 = 12: d3e7462138 bitmap: add bitmap_unset() function
11: 438c21082e ! 13: 07ee699520 pack-bitmap: implement BLOB_NONE filtering
@@ Commit message
bitmaps is about the same as our loop here, but it might make the
code a bit simpler).
- The regression tests just compare the bitmap output to the non-bitmap
- output. Since the code internally falls back to the non-bitmap path in
- some cases, this is at risk of being a trivial noop. To combat this,
- we'll check for small differences between the two outputs (see the
- comment in the test). This is a little fragile, as it's possible that we
- may teach the fallback path for --use-bitmap-index to produce
- bitmap-like output in the future. But the exact ordering of objects
- would likely be different in such a case, anyway.
+ Here are perf results for the new test on git.git:
- Plus we'd catch an unexpected fallback when running the perf suite, as
- things would get slower. Here's what it looks like now (on git.git):
-
- Test HEAD^ HEAD
- --------------------------------------------------------------------------------
- 5310.7: rev-list count with blob:none 1.67(1.62+0.05) 0.22(0.21+0.02) -86.8%
+ Test HEAD^ HEAD
+ --------------------------------------------------------------------------------
+ 5310.9: rev-list count with blob:none 1.67(1.62+0.05) 0.22(0.21+0.02) -86.8%
## pack-bitmap.c ##
@@ pack-bitmap.c: static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
@@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
}
## t/perf/p5310-pack-bitmaps.sh ##
-@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'pack to file (bitmap)' '
- git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'rev-list (objects)' '
+ git rev-list --all --use-bitmap-index --objects >/dev/null
'
+test_perf 'rev-list count with blob:none' '
@@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'filters fallback to non
test_cmp expect actual
'
-+# the bitmap and regular traversals produce subtly different output:
-+#
-+# - regular output is in traversal order, whereas bitmap is split by type,
-+# with non-packed objects at the end
-+#
-+# - regular output has a space and the pathname appended to non-commit
-+# objects; bitmap output omits this
-+#
-+# Normalize and compare the two. The second argument should always be the
-+# bitmap output.
-+cmp_bitmap_traversal () {
-+ if cmp "$1" "$2"
-+ then
-+ echo >&2 "identical raw outputs; are you sure bitmaps were used?"
-+ return 1
-+ fi &&
-+ cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
-+ sort "$2" >"$2.normalized" &&
-+ test_cmp "$1.normalized" "$2.normalized"
-+}
-+
+test_expect_success 'blob:none filter' '
+ git rev-list --objects --filter=blob:none HEAD >expect &&
+ git rev-list --use-bitmap-index \
+ --objects --filter=blob:none HEAD >actual &&
-+ cmp_bitmap_traversal expect actual
++ test_bitmap_traversal expect actual
+'
+
+test_expect_success 'blob:none filter with specified blob' '
+ git rev-list --objects --filter=blob:none HEAD HEAD:two.t >expect &&
+ git rev-list --use-bitmap-index \
+ --objects --filter=blob:none HEAD HEAD:two.t >actual &&
-+ cmp_bitmap_traversal expect actual
++ test_bitmap_traversal expect actual
+'
+
test_done
12: 05cd57f0ba ! 14: f35fab09a0 pack-bitmap: implement BLOB_LIMIT filtering
@@ Commit message
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):
- Test HEAD~2 HEAD
+ Test HEAD~2 HEAD
------------------------------------------------------------------------------------
- 5310.7: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8%
- 5310.8: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
+ 5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8%
+ 5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
@@ t/t6113-rev-list-bitmap-filters.sh: test_description='rev-list combining bitmaps
test_expect_success 'filters fallback to non-bitmap traversal' '
@@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'blob:none filter with specified blob' '
- cmp_bitmap_traversal expect actual
+ test_bitmap_traversal expect actual
'
+test_expect_success 'blob:limit filter' '
+ git rev-list --objects --filter=blob:limit=5 HEAD >expect &&
+ git rev-list --use-bitmap-index \
+ --objects --filter=blob:limit=5 HEAD >actual &&
-+ cmp_bitmap_traversal expect actual
++ test_bitmap_traversal expect actual
+'
+
+test_expect_success 'blob:limit filter with specified blob' '
@@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'blob:none filter with s
+ git rev-list --use-bitmap-index \
+ --objects --filter=blob:limit=5 \
+ HEAD HEAD:much-larger-blob-two.t >actual &&
-+ cmp_bitmap_traversal expect actual
++ test_bitmap_traversal expect actual
+'
+
test_done
13: 1e5befb73a ! 15: 797dbfe28f pack-objects: support filters with bitmaps
@@ Commit message
This unsurprisingly makes things faster for partial clones of large
repositories (here we're cloning linux.git):
- Test HEAD^ HEAD
+ Test HEAD^ HEAD
------------------------------------------------------------------------------
- 5310.9: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6%
+ 5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6%
## builtin/pack-objects.c ##
@@ builtin/pack-objects.c: static int pack_options_allow_reuse(void)
next prev parent reply other threads:[~2020-02-14 18:21 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-13 2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
2020-02-13 2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
2020-02-13 17:45 ` Junio C Hamano
2020-02-13 2:16 ` [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists Jeff King
2020-02-13 18:12 ` Junio C Hamano
2020-02-13 2:17 ` [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering Jeff King
2020-02-13 18:19 ` Junio C Hamano
2020-02-13 18:40 ` Jeff King
2020-02-13 2:17 ` [PATCH 04/13] rev-list: consolidate bitmap-disabling options Jeff King
2020-02-13 2:18 ` [PATCH 05/13] rev-list: factor out bitmap-optimized routines Jeff King
2020-02-13 18:34 ` Junio C Hamano
2020-02-13 2:19 ` [PATCH 06/13] rev-list: make --count work with --objects Jeff King
2020-02-13 19:14 ` Junio C Hamano
2020-02-13 20:27 ` Jeff King
2020-02-13 2:20 ` [PATCH 07/13] rev-list: allow bitmaps when counting objects Jeff King
2020-02-13 21:47 ` Junio C Hamano
2020-02-13 22:27 ` Jeff King
2020-02-13 2:20 ` [PATCH 08/13] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
2020-02-13 2:21 ` [PATCH 09/13] rev-list: use bitmap filters for traversal Jeff King
2020-02-13 22:22 ` Junio C Hamano
2020-02-13 22:34 ` Jeff King
2020-02-13 2:21 ` [PATCH 10/13] bitmap: add bitmap_unset() function Jeff King
2020-02-13 2:23 ` [PATCH 11/13] pack-bitmap: implement BLOB_NONE filtering Jeff King
2020-02-13 2:25 ` [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
2020-02-13 23:17 ` Junio C Hamano
2020-02-13 2:25 ` [PATCH 13/13] pack-objects: support filters with bitmaps Jeff King
2020-02-14 18:21 ` Jeff King [this message]
2020-02-14 18:22 ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
2020-02-15 0:10 ` Taylor Blau
2020-02-14 18:22 ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
2020-02-15 0:15 ` Taylor Blau
2020-02-15 6:46 ` Jeff King
2020-02-18 17:58 ` Derrick Stolee
2020-02-18 20:02 ` Jeff King
2020-02-14 18:22 ` [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering Jeff King
2020-02-15 0:22 ` Taylor Blau
2020-02-14 18:22 ` [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs Jeff King
2020-02-14 19:03 ` Junio C Hamano
2020-02-14 20:51 ` Jeff King
2020-02-14 18:22 ` [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines Jeff King
2020-02-15 0:35 ` Taylor Blau
2020-02-14 18:22 ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
2020-02-15 0:42 ` Taylor Blau
2020-02-15 6:48 ` Jeff King
2020-02-16 23:34 ` Junio C Hamano
2020-02-18 5:24 ` Jeff King
2020-02-18 17:28 ` Junio C Hamano
2020-02-18 19:55 ` Jeff King
2020-02-18 21:19 ` Junio C Hamano
2020-02-18 21:23 ` Jeff King
2020-02-18 18:05 ` Derrick Stolee
2020-02-18 19:59 ` Jeff King
2020-02-14 18:22 ` [PATCH v2 07/15] rev-list: allow bitmaps when counting objects Jeff King
2020-02-15 0:45 ` Taylor Blau
2020-02-15 6:55 ` Jeff King
2020-02-16 23:36 ` Junio C Hamano
2020-02-14 18:22 ` [PATCH v2 08/15] t5310: factor out bitmap traversal comparison Jeff King
2020-02-15 2:14 ` Taylor Blau
2020-02-15 7:00 ` Jeff King
2020-02-14 18:22 ` [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals Jeff King
2020-02-18 18:18 ` Derrick Stolee
2020-02-18 20:05 ` Jeff King
2020-02-18 20:11 ` Derrick Stolee
2020-02-14 18:22 ` [PATCH v2 10/15] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
2020-02-14 18:22 ` [PATCH v2 11/15] rev-list: use bitmap filters for traversal Jeff King
2020-02-14 18:22 ` [PATCH v2 12/15] bitmap: add bitmap_unset() function Jeff King
2020-02-14 18:22 ` [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering Jeff King
2020-02-18 19:26 ` Derrick Stolee
2020-02-18 19:36 ` Derrick Stolee
2020-02-18 20:30 ` Jeff King
2020-02-18 20:24 ` Jeff King
2020-02-14 18:22 ` [PATCH v2 14/15] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
2020-02-14 18:22 ` [PATCH v2 15/15] pack-objects: support filters with bitmaps Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200214182147.GA654525@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).