git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/15] combining object filters and bitmaps
Date: Fri, 14 Feb 2020 13:21:47 -0500	[thread overview]
Message-ID: <20200214182147.GA654525@coredump.intra.peff.net> (raw)
In-Reply-To: <20200213021506.GA1124607@coredump.intra.peff.net>

Here's a v2 based on the early feedback from Junio, and a few things I
noticed while addressing it:

 - patch 4 now moves the check for revs.prune into the pack-bitmap code,
   rather than rearranging it within rev-list.

 - I noticed some more opportunities to reuse the "compare the results
   of a bitmap and non-bitmap traversal" code, so I factored that out
   into its own commit (patch 8)

 - I realized another long-standing annoyance is simple to fix: before
   we did not know how to do "rev-list --use-bitmap-index" without
   "--objects", but it was pretty easy to support. That's patch 9.

The range diff below is a little noisy because of the ripple effects of
some of those changes. If you just want to review the substantive
changes, look at patches 4, 8, and 9.

  [01/15]: pack-bitmap: factor out type iterator initialization
  [02/15]: pack-bitmap: fix leak of haves/wants object lists
  [03/15]: rev-list: fallback to non-bitmap traversal when filtering
  [04/15]: pack-bitmap: refuse to do a bitmap traversal with pathspecs
  [05/15]: rev-list: factor out bitmap-optimized routines
  [06/15]: rev-list: make --count work with --objects
  [07/15]: rev-list: allow bitmaps when counting objects
  [08/15]: t5310: factor out bitmap traversal comparison
  [09/15]: rev-list: allow commit-only bitmap traversals
  [10/15]: pack-bitmap: basic noop bitmap filter infrastructure
  [11/15]: rev-list: use bitmap filters for traversal
  [12/15]: bitmap: add bitmap_unset() function
  [13/15]: pack-bitmap: implement BLOB_NONE filtering
  [14/15]: pack-bitmap: implement BLOB_LIMIT filtering
  [15/15]: pack-objects: support filters with bitmaps

 builtin/pack-objects.c             |   6 +-
 builtin/rev-list.c                 | 114 +++++++++---
 ewah/bitmap.c                      |   8 +
 ewah/ewok.h                        |   1 +
 object.c                           |   9 +
 object.h                           |   2 +
 pack-bitmap.c                      | 272 +++++++++++++++++++++++++----
 pack-bitmap.h                      |   5 +-
 reachable.c                        |   4 +-
 t/perf/p5310-pack-bitmaps.sh       |  22 +++
 t/t5310-pack-bitmaps.sh            |  36 +++-
 t/t6000-rev-list-misc.sh           |  12 ++
 t/t6113-rev-list-bitmap-filters.sh |  56 ++++++
 t/test-lib-functions.sh            |  27 +++
 14 files changed, 504 insertions(+), 70 deletions(-)
 create mode 100755 t/t6113-rev-list-bitmap-filters.sh

Range-diff from v1:

 1:  283874f2ac =  1:  4424d745de pack-bitmap: factor out type iterator initialization
 2:  68e7bed41c =  2:  bc0576868f pack-bitmap: fix leak of haves/wants object lists
 3:  1182593fb2 =  3:  9d2ef09b9b rev-list: fallback to non-bitmap traversal when filtering
 4:  03b20e9b66 <  -:  ---------- rev-list: consolidate bitmap-disabling options
 -:  ---------- >  4:  ce7969221a pack-bitmap: refuse to do a bitmap traversal with pathspecs
 5:  91032b0d78 =  5:  a9a5513bb0 rev-list: factor out bitmap-optimized routines
 6:  c7c3a31cf3 !  6:  eab5dd34bc rev-list: make --count work with --objects
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
     +	    (revs.left_right || revs.cherry_mark))
     +		die(_("marked counting is incompatible with --objects"));
     +
    - 	if (filter_options.choice || revs.prune)
    + 	if (filter_options.choice)
      		use_bitmap_index = 0;
      
     
 7:  0aabf6262b =  7:  04af6b9362 rev-list: allow bitmaps when counting objects
 -:  ---------- >  8:  4b93d064b1 t5310: factor out bitmap traversal comparison
 -:  ---------- >  9:  d89509fa8f rev-list: allow commit-only bitmap traversals
 8:  969238ec42 ! 10:  c64d4db8e8 pack-bitmap: basic noop bitmap filter infrastructure
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs)
      		return -1;
      
     @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs)
    - 	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
    + 	if (revs->max_count >= 0)
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs);
    @@ pack-bitmap.c: static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
      	unsigned int i;
      
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
    - 	struct bitmap *wants_bitmap = NULL;
    - 	struct bitmap *haves_bitmap = NULL;
    + 	if (revs->prune)
    + 		return NULL;
      
    --	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    -+	struct bitmap_index *bitmap_git;
    -+
     +	if (!can_filter_bitmap(filter))
     +		return NULL;
     +
      	/* try to open a bitmapped pack, but don't parse it yet
      	 * because we may not need to use it */
    -+	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    - 	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
    - 		goto cleanup;
    - 
    + 	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
      	if (haves_bitmap)
      		bitmap_and_not(wants_bitmap, haves_bitmap);
    @@ pack-bitmap.h
      
      static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
      
    -@@ pack-bitmap.h: void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
    - void traverse_bitmap_commit_list(struct bitmap_index *,
    +@@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
    + 				 struct rev_info *revs,
      				 show_reachable_fn show_reachable);
      void test_bitmap_walk(struct rev_info *revs);
     -struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
    @@ reachable.c: void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
     -	bitmap_git = prepare_bitmap_walk(revs);
     +	bitmap_git = prepare_bitmap_walk(revs, NULL);
      	if (bitmap_git) {
    - 		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
    + 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
      		free_bitmap_index(bitmap_git);
 9:  b4eb8fcb9d ! 11:  caa69ac535 rev-list: use bitmap filters for traversal
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs)
      	struct bitmap_index *bitmap_git;
      
     @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs)
    - 	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
    + 	if (revs->max_count >= 0)
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs, NULL);
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      	    (revs.left_right || revs.cherry_mark))
      		die(_("marked counting is incompatible with --objects"));
      
    --	if (filter_options.choice || revs.prune)
    -+	if (revs.prune)
    - 		use_bitmap_index = 0;
    - 
    +-	if (filter_options.choice)
    +-		use_bitmap_index = 0;
    +-
      	save_commit_buffer = (revs.verbose_header ||
    + 			      revs.grep_filter.pattern_list ||
    + 			      revs.grep_filter.header_list);
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
      		progress = start_delayed_progress(show_progress, 0);
      
10:  b518070c12 = 12:  d3e7462138 bitmap: add bitmap_unset() function
11:  438c21082e ! 13:  07ee699520 pack-bitmap: implement BLOB_NONE filtering
    @@ Commit message
             bitmaps is about the same as our loop here, but it might make the
             code a bit simpler).
     
    -    The regression tests just compare the bitmap output to the non-bitmap
    -    output. Since the code internally falls back to the non-bitmap path in
    -    some cases, this is at risk of being a trivial noop. To combat this,
    -    we'll check for small differences between the two outputs (see the
    -    comment in the test). This is a little fragile, as it's possible that we
    -    may teach the fallback path for --use-bitmap-index to produce
    -    bitmap-like output in the future. But the exact ordering of objects
    -    would likely be different in such a case, anyway.
    +    Here are perf results for the new test on git.git:
     
    -    Plus we'd catch an unexpected fallback when running the perf suite, as
    -    things would get slower. Here's what it looks like now (on git.git):
    -
    -    Test                                    HEAD^             HEAD
    -    --------------------------------------------------------------------------------
    -    5310.7: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%
    +      Test                                    HEAD^             HEAD
    +      --------------------------------------------------------------------------------
    +      5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%
     
      ## pack-bitmap.c ##
     @@ pack-bitmap.c: static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
    @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
      }
     
      ## t/perf/p5310-pack-bitmaps.sh ##
    -@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'pack to file (bitmap)' '
    - 	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
    +@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'rev-list (objects)' '
    + 	git rev-list --all --use-bitmap-index --objects >/dev/null
      '
      
     +test_perf 'rev-list count with blob:none' '
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'filters fallback to non
      	test_cmp expect actual
      '
      
    -+# the bitmap and regular traversals produce subtly different output:
    -+#
    -+#   - regular output is in traversal order, whereas bitmap is split by type,
    -+#     with non-packed objects at the end
    -+#
    -+#   - regular output has a space and the pathname appended to non-commit
    -+#     objects; bitmap output omits this
    -+#
    -+# Normalize and compare the two. The second argument should always be the
    -+# bitmap output.
    -+cmp_bitmap_traversal () {
    -+	if cmp "$1" "$2"
    -+	then
    -+		echo >&2 "identical raw outputs; are you sure bitmaps were used?"
    -+		return 1
    -+	fi &&
    -+	cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
    -+	sort "$2" >"$2.normalized" &&
    -+	test_cmp "$1.normalized" "$2.normalized"
    -+}
    -+
     +test_expect_success 'blob:none filter' '
     +	git rev-list --objects --filter=blob:none HEAD >expect &&
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:none HEAD >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
     +test_expect_success 'blob:none filter with specified blob' '
     +	git rev-list --objects --filter=blob:none HEAD HEAD:two.t >expect &&
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:none HEAD HEAD:two.t >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
      test_done
12:  05cd57f0ba ! 14:  f35fab09a0 pack-bitmap: implement BLOB_LIMIT filtering
    @@ Commit message
         than BLOB_NONE, but still produces a noticeable speedup (these results
         are on git.git):
     
    -      Test                                        HEAD~2            HEAD
    +      Test                                         HEAD~2            HEAD
           ------------------------------------------------------------------------------------
    -      5310.7: rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
    -      5310.8: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%
    +      5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
    +      5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%
     
         The implementation is similar to the BLOB_NONE one, with the exception
         that we have to go object-by-object while walking the blob-type bitmap
    @@ t/t6113-rev-list-bitmap-filters.sh: test_description='rev-list combining bitmaps
      
      test_expect_success 'filters fallback to non-bitmap traversal' '
     @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'blob:none filter with specified blob' '
    - 	cmp_bitmap_traversal expect actual
    + 	test_bitmap_traversal expect actual
      '
      
     +test_expect_success 'blob:limit filter' '
     +	git rev-list --objects --filter=blob:limit=5 HEAD >expect &&
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:limit=5 HEAD >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
     +test_expect_success 'blob:limit filter with specified blob' '
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'blob:none filter with s
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:limit=5 \
     +		     HEAD HEAD:much-larger-blob-two.t >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
      test_done
13:  1e5befb73a ! 15:  797dbfe28f pack-objects: support filters with bitmaps
    @@ Commit message
         This unsurprisingly makes things faster for partial clones of large
         repositories (here we're cloning linux.git):
     
    -      Test                              HEAD^               HEAD
    +      Test                               HEAD^               HEAD
           ------------------------------------------------------------------------------
    -      5310.9: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%
    +      5310.11: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%
     
      ## builtin/pack-objects.c ##
     @@ builtin/pack-objects.c: static int pack_options_allow_reuse(void)

  parent reply	other threads:[~2020-02-14 18:21 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
2020-02-13  2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
2020-02-13 17:45   ` Junio C Hamano
2020-02-13  2:16 ` [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists Jeff King
2020-02-13 18:12   ` Junio C Hamano
2020-02-13  2:17 ` [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering Jeff King
2020-02-13 18:19   ` Junio C Hamano
2020-02-13 18:40     ` Jeff King
2020-02-13  2:17 ` [PATCH 04/13] rev-list: consolidate bitmap-disabling options Jeff King
2020-02-13  2:18 ` [PATCH 05/13] rev-list: factor out bitmap-optimized routines Jeff King
2020-02-13 18:34   ` Junio C Hamano
2020-02-13  2:19 ` [PATCH 06/13] rev-list: make --count work with --objects Jeff King
2020-02-13 19:14   ` Junio C Hamano
2020-02-13 20:27     ` Jeff King
2020-02-13  2:20 ` [PATCH 07/13] rev-list: allow bitmaps when counting objects Jeff King
2020-02-13 21:47   ` Junio C Hamano
2020-02-13 22:27     ` Jeff King
2020-02-13  2:20 ` [PATCH 08/13] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
2020-02-13  2:21 ` [PATCH 09/13] rev-list: use bitmap filters for traversal Jeff King
2020-02-13 22:22   ` Junio C Hamano
2020-02-13 22:34     ` Jeff King
2020-02-13  2:21 ` [PATCH 10/13] bitmap: add bitmap_unset() function Jeff King
2020-02-13  2:23 ` [PATCH 11/13] pack-bitmap: implement BLOB_NONE filtering Jeff King
2020-02-13  2:25 ` [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
2020-02-13 23:17   ` Junio C Hamano
2020-02-13  2:25 ` [PATCH 13/13] pack-objects: support filters with bitmaps Jeff King
2020-02-14 18:21 ` Jeff King [this message]
2020-02-14 18:22   ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
2020-02-15  0:10     ` Taylor Blau
2020-02-14 18:22   ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
2020-02-15  0:15     ` Taylor Blau
2020-02-15  6:46       ` Jeff King
2020-02-18 17:58     ` Derrick Stolee
2020-02-18 20:02       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering Jeff King
2020-02-15  0:22     ` Taylor Blau
2020-02-14 18:22   ` [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs Jeff King
2020-02-14 19:03     ` Junio C Hamano
2020-02-14 20:51       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines Jeff King
2020-02-15  0:35     ` Taylor Blau
2020-02-14 18:22   ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
2020-02-15  0:42     ` Taylor Blau
2020-02-15  6:48       ` Jeff King
2020-02-16 23:34         ` Junio C Hamano
2020-02-18  5:24           ` Jeff King
2020-02-18 17:28             ` Junio C Hamano
2020-02-18 19:55               ` Jeff King
2020-02-18 21:19                 ` Junio C Hamano
2020-02-18 21:23                   ` Jeff King
2020-02-18 18:05     ` Derrick Stolee
2020-02-18 19:59       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 07/15] rev-list: allow bitmaps when counting objects Jeff King
2020-02-15  0:45     ` Taylor Blau
2020-02-15  6:55       ` Jeff King
2020-02-16 23:36         ` Junio C Hamano
2020-02-14 18:22   ` [PATCH v2 08/15] t5310: factor out bitmap traversal comparison Jeff King
2020-02-15  2:14     ` Taylor Blau
2020-02-15  7:00       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals Jeff King
2020-02-18 18:18     ` Derrick Stolee
2020-02-18 20:05       ` Jeff King
2020-02-18 20:11         ` Derrick Stolee
2020-02-14 18:22   ` [PATCH v2 10/15] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
2020-02-14 18:22   ` [PATCH v2 11/15] rev-list: use bitmap filters for traversal Jeff King
2020-02-14 18:22   ` [PATCH v2 12/15] bitmap: add bitmap_unset() function Jeff King
2020-02-14 18:22   ` [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering Jeff King
2020-02-18 19:26     ` Derrick Stolee
2020-02-18 19:36       ` Derrick Stolee
2020-02-18 20:30         ` Jeff King
2020-02-18 20:24       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 14/15] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
2020-02-14 18:22   ` [PATCH v2 15/15] pack-objects: support filters with bitmaps Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200214182147.GA654525@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).