git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Garima Singh" <garima.singh@microsoft.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Jeff King" <peff@peff.net>,
	"Taylor Blau" <me@ttaylorr.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH 25/34] commit-graph: check embedded modified path Bloom filters with a mask
Date: Fri, 29 May 2020 10:50:29 +0200	[thread overview]
Message-ID: <20200529085038.26008-26-szeder.dev@gmail.com> (raw)
In-Reply-To: <20200529085038.26008-1-szeder.dev@gmail.com>

Since the previous patch we check the presence of all leading
directories of a pathspec in modified path Bloom filters to
significantly lower the false positive rate.  This means that we are
checking a lot of bit positions in the Bloom filters.  However, as
shown earlier in this series, a significant portion of commits have
embedded modified path Bloom filters, all with the same size of 63
bits.

Use a 64 bit mask to check all relevant bits in embedded modified path
Bloom filters in one step.

I don't have benchmark results at hand, but as far as I can recall
some old results this reduces the time spent loading and checking
modified path Bloom filters by up to 10%.  Checking Bloom filters is
of course only a small part of the whole runtime of a pathspec-limited
revision walk, so the overall improvement is only about 1-2%.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 commit-graph.c | 17 +++++++++++++++--
 pathspec.c     |  1 +
 pathspec.h     |  1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 1fd1b4f8dd..56b8f33d10 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -918,10 +918,16 @@ enum bloom_result check_modified_path_bloom_filter(struct repository *r,
 	for (i = 0; i < pathspec->nr; i++) {
 		struct pathspec_item *pi = &pathspec->items[i];
 
-		if (bloom_filter_check_bits(&bf,
+		if (bf.nr_bits == GRAPH_MODIFIED_PATH_BLOOM_FILTER_EMBEDDED_NR_BITS) {
+			/* Gross!  And potential alignment issues?! */
+			if ((*((const uint64_t*)bf.bits) & pi->modified_path_bloom_mask) == pi->modified_path_bloom_mask)
+				return BLOOM_POSSIBLY_YES;
+		} else {
+			if (bloom_filter_check_bits(&bf,
 					pi->modified_path_bloom_hashes,
 					pi->modified_path_bloom_hashes_nr))
-			return BLOOM_POSSIBLY_YES;
+				return BLOOM_POSSIBLY_YES;
+		}
 	}
 
 	return BLOOM_DEFINITELY_NOT;
@@ -981,6 +987,7 @@ void init_pathspec_bloom_fields(struct repository *r,
 		size_t len = pi->len;
 		int path_component_nr = 0, j;
 		uint32_t *hashes;
+		struct bloom_filter embedded_bf;
 
 		/*
 		 * Pathspec parsing has normalized away any consecutive
@@ -1012,6 +1019,12 @@ void init_pathspec_bloom_fields(struct repository *r,
 					graph->num_modified_path_bloom_hashes,
 					hashes);
 		}
+
+		embedded_bf.nr_bits = GRAPH_MODIFIED_PATH_BLOOM_FILTER_EMBEDDED_NR_BITS;
+		embedded_bf.bits = (uint8_t*) &pi->modified_path_bloom_mask;
+		bloom_filter_set_bits(&embedded_bf,
+				      pi->modified_path_bloom_hashes,
+				      pi->modified_path_bloom_hashes_nr);
 	}
 
 	pathspec->can_use_modified_path_bloom_filters = 1;
diff --git a/pathspec.c b/pathspec.c
index 01e7ae6944..29cadce013 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -408,6 +408,7 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
 	item->attr_match_nr = 0;
 	item->modified_path_bloom_hashes_nr = 0;
 	item->modified_path_bloom_hashes = NULL;
+	item->modified_path_bloom_mask = 0;
 
 	/* PATHSPEC_LITERAL_PATH ignores magic */
 	if (flags & PATHSPEC_LITERAL_PATH) {
diff --git a/pathspec.h b/pathspec.h
index 0c661e1b03..c9975aea7a 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -64,6 +64,7 @@ struct pathspec {
 		 */
 		uint32_t modified_path_bloom_hashes_nr;
 		uint32_t *modified_path_bloom_hashes;
+		uint64_t modified_path_bloom_mask;
 	} *items;
 };
 
-- 
2.27.0.rc1.431.g5c813f95dc


  parent reply	other threads:[~2020-05-29  8:52 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29  8:50 [PoC PATCH 00/34] An alternative modified path Bloom filters implementation SZEDER Gábor
2020-05-28 22:09 ` Johannes Schindelin
2020-05-29  8:50 ` [PATCH 01/34] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor
2020-05-29  8:50 ` [PATCH 02/34] commit-graph: fix parsing the Chunk Lookup table SZEDER Gábor
2020-05-29  8:50 ` [PATCH 03/34] commit-graph-format.txt: all multi-byte numbers are in network byte order SZEDER Gábor
2020-05-29  8:50 ` [PATCH 04/34] commit-slab: add a function to deep free entries on the slab SZEDER Gábor
2020-06-04 16:43   ` Derrick Stolee
2020-06-27 15:56     ` SZEDER Gábor
2020-06-29 11:30       ` Derrick Stolee
2020-05-29  8:50 ` [PATCH 05/34] diff.h: drop diff_tree_oid() & friends' return value SZEDER Gábor
2020-05-29  8:50 ` [PATCH 06/34] commit-graph: clean up #includes SZEDER Gábor
2020-05-29  8:50 ` [PATCH 07/34] commit-graph: simplify parse_commit_graph() #1 SZEDER Gábor
2020-05-29  8:50 ` [PATCH 08/34] commit-graph: simplify parse_commit_graph() #2 SZEDER Gábor
2020-05-29  8:50 ` [PATCH 09/34] commit-graph: simplify write_commit_graph_file() #1 SZEDER Gábor
2020-05-29  8:50 ` [PATCH 10/34] commit-graph: simplify write_commit_graph_file() #2 SZEDER Gábor
2020-05-29  8:50 ` [PATCH 11/34] commit-graph: allocate the 'struct chunk_info' array dinamically SZEDER Gábor
2020-05-29  8:50 ` [PATCH 12/34] commit-graph: unify the signatures of all write_graph_chunk_*() functions SZEDER Gábor
2020-05-29  8:50 ` [PATCH 13/34] commit-graph: simplify write_commit_graph_file() #3 SZEDER Gábor
2020-05-29  8:50 ` [PATCH 14/34] commit-graph: check chunk sizes after writing SZEDER Gábor
2020-05-29  8:50 ` [PATCH 15/34] commit-graph-format.txt: document the modified path Bloom filter chunks SZEDER Gábor
2020-06-02 17:59   ` SZEDER Gábor
2020-05-29  8:50 ` [PATCH 16/34] Add a generic and minimal Bloom filter implementation SZEDER Gábor
2020-05-29  8:50 ` [PATCH 17/34] Import a streaming-capable Murmur3 hash function implementation SZEDER Gábor
2020-05-29  8:50 ` [PATCH 18/34] commit-graph: write "empty" Modified Path Bloom Filter Index chunk SZEDER Gábor
2020-05-29  8:50 ` [PATCH 19/34] commit-graph: add commit slab for modified path Bloom filters SZEDER Gábor
2020-05-29  8:50 ` [PATCH 20/34] commit-graph: fill the Modified Path Bloom Filter Index chunk SZEDER Gábor
2020-05-29  8:50 ` [PATCH 21/34] commit-graph: load and use " SZEDER Gábor
2020-05-29  8:50 ` [PATCH 22/34] commit-graph: write the Modified Path Bloom Filters chunk SZEDER Gábor
2020-05-29  8:50 ` [PATCH 23/34] commit-graph: load and use " SZEDER Gábor
2020-05-29  8:50 ` [PATCH 24/34] commit-graph: check all leading directories in modified path Bloom filters SZEDER Gábor
2020-05-29  8:50 ` SZEDER Gábor [this message]
2020-05-29  8:50 ` [PATCH 26/34] commit-graph: deduplicate " SZEDER Gábor
2020-05-29  8:50 ` [PATCH 27/34] commit-graph: load modified path Bloom filters for merge commits SZEDER Gábor
2020-05-29  8:50 ` [PATCH 28/34] commit-graph: write Modified Path Bloom Filter Merge Index chunk SZEDER Gábor
2020-05-29  8:50 ` [PATCH 29/34] commit-graph: extract init and free write_commit_graph_context SZEDER Gábor
2020-05-29  8:50 ` [PATCH 30/34] commit-graph: move write_commit_graph_reachable below write_commit_graph SZEDER Gábor
2020-05-29  8:50 ` [PATCH 31/34] t7007-show: make the first test compatible with the next patch SZEDER Gábor
2020-05-29  8:50 ` [PATCH 32/34] PoC commit-graph: use revision walk machinery for '--reachable' SZEDER Gábor
2020-05-29  8:50 ` [PATCH 33/34] commit-graph: write modified path Bloom filters in "history order" SZEDER Gábor
2020-05-29  8:50 ` [PATCH 34/34] commit-graph: use modified path Bloom filters with wildcards, if possible SZEDER Gábor
2020-05-29 13:59 ` [PoC PATCH 00/34] An alternative modified path Bloom filters implementation Derrick Stolee
2020-06-01 23:25 ` Taylor Blau
2020-06-02 17:08   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200529085038.26008-26-szeder.dev@gmail.com \
    --to=szeder.dev@gmail.com \
    --cc=garima.singh@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).