Re: [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Jakub Narebski <jnareb@gmail.com>
To: "Garima Singh via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Derrick Stolee" <stolee@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Taylor Blau" <me@ttaylorr.com>, "Jeff King" <peff@peff.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Garima Singh" <garima.singh@microsoft.com>
Subject: Re: [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks
Date: Sat, 11 Jan 2020 01:27:23 +0100	[thread overview]
Message-ID: <86d0bqsuqc.fsf@gmail.com> (raw)
In-Reply-To: <72a2bbf6765a1e99a3a23372f801099a07fe11a5.1576879520.git.gitgitgadget@gmail.com> (Garima Singh via GitGitGadget's message of "Fri, 20 Dec 2019 22:05:19 +0000")

"Garima Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Garima Singh <garima.singh@microsoft.com>
>
> If bloom filters have been written to the commit-graph file, revision walk will
> use them to speed up revision walks for a particular path.

I'd propose the following change, to make structure of the above
sentence simpler:

  Revision walk will now use Bloom filters for commits to speed up
  revision walks for a particular path (for computing history of a
  path), if they are present in the commit-graph file.

> Note: The current implementation does this in the case of single pathspec
> case only.
>
> We load the bloom filters during the prepare_revision_walk step when dealing

I think it would flow better with the following change:

s/when dealing/, but only when dealing/

> with a single pathspec. While comparing trees in rev_compare_trees(), if the
> bloom filter says that the file is not different between the two trees, we
> don't need to compute the expensive diff. This is where we get our performance
> gains.

Maybe we should also add:

  The other answer we can get from the Bloom filter is "maybe".

>
> Performance Gains:
> We tested the performance of `git log --path` on the git repo, the linux and
> some internal large repos, with a variety of paths of varying depths.

I think you meant `git log <path>`, not `git log --path`.

>
> On the git and linux repos:
> we observed a 2x to 5x speed up.

It would be good, I think, to have some specific numbers: starting from
this version, for this path, with and without Bloom filters it takes so
long (and the improvement in %).

While at it, it might be good idea to provide _costs_: how much
additional space on disk Bloom filters take for those specific examples,
and how much extra time (and memory) it takes to compute Bloom filters.

With actual specific numbers we can estimate when it would start to be
worth it to create Bloom filter data...

>
> On a large internal repo with files seated 6-10 levels deep in the tree:
> we observed 10x to 20x speed ups, with some paths going up to 28 times
> faster.

It would be nice to see specific numbers, if showing pathnames is
possible.  In any case it would be good to have more information: what
paths give 10x, what give 20x, and what kinds give 28x speedup (what
is path depth, how many objects, etc.).

>
> RFC Notes:
> I plan to collect the folloowing statistics around this usage of bloom filters
> and trace them out using trace2.
> - number of bloom filter queries,
> - number of "No" responses (file hasn't changed)
> - number of "Maybe" responses (file may have changed)
> - number of "Commit not parsed" cases (commit had too many changes to have a
>   bloom filter written out, currently our limit is 512 diffs)

Perhaps also:
  - histogram of bloom filter sizes in 64-bit blocks
    (which is rough histogram of number of changes per commit)

Though I think all those statistics are a bit specific to the
repository, and how you use Git.

>
> Helped-by: Derrick Stolee <dstolee@microsoft.com
> Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
> Helped-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Garima Singh <garima.singh@microsoft.com>
> ---
>  bloom.c              | 20 ++++++++++++
>  bloom.h              |  4 +++
>  revision.c           | 67 +++++++++++++++++++++++++++++++++++++--
>  revision.h           |  5 +++
>  t/t4216-log-bloom.sh | 74 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 168 insertions(+), 2 deletions(-)
>  create mode 100755 t/t4216-log-bloom.sh
>
> diff --git a/bloom.c b/bloom.c
> index 86b1005802..0c7505d3d6 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -235,3 +235,23 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
>  
>  	return filter;
>  }
> +
> +int bloom_filter_contains(struct bloom_filter *filter,
> +			  struct bloom_key *key,
> +			  struct bloom_filter_settings *settings)
> +{
> +	int i;
> +	uint64_t mod = filter->len * BITS_PER_BLOCK;
> +
> +	if (!mod)
> +		return 1;

Ah, so filter->len equal to zero denotes too many changes for a Bloom
filter, and this conditional is a short-circuit test: always return
"maybe".

I wonder if we should explicitly check for filter->len = 1 and
filter->data[0] = 0, which should be empty Bloom filter -- for commit
with no changes with respect to first parent, and short-circuit
returning 0 (no file would ever belong).

> +
> +	for (i = 0; i < settings->num_hashes; i++) {
> +		uint64_t hash_mod = key->hashes[i] % mod;
> +		uint64_t block_pos = hash_mod / BITS_PER_BLOCK;

I have seen this code before... ;-)  The add_key_to_filter() function
includes almost identical code, but I am not sure if it is feasible to
eliminate this (slight) code duplication.  Probably not.

> +		if (!(filter->data[block_pos] & get_bitmask(hash_mod)))
> +			return 0;

All right, if at least one hash function does not match, return "no"...

> +	}
> +
> +	return 1;

... else return "maybe".

> +}

I am wondering however if this code should not be moved earlier in the
series, so that commit 2/9 in series actually adds fully functional
[semi-generic] Bloom filter implementation.

> diff --git a/bloom.h b/bloom.h
> index 101d689bbd..9bdacd0a8e 100644
> --- a/bloom.h
> +++ b/bloom.h
> @@ -44,4 +44,8 @@ void fill_bloom_key(const char *data,
>  		    struct bloom_key *key,
>  		    struct bloom_filter_settings *settings);
>  
> +int bloom_filter_contains(struct bloom_filter *filter,
> +			  struct bloom_key *key,
> +			  struct bloom_filter_settings *settings);
> +
>  #endif
> diff --git a/revision.c b/revision.c
> index 39a25e7a5d..01f5330740 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -29,6 +29,7 @@
>  #include "prio-queue.h"
>  #include "hashmap.h"
>  #include "utf8.h"
> +#include "bloom.h"
>  
>  volatile show_early_output_fn_t show_early_output;
>  
> @@ -624,11 +625,34 @@ static void file_change(struct diff_options *options,
>  	options->flags.has_changes = 1;
>  }
>  
> +static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
> +						 struct commit *commit,
> +						 struct bloom_key *key,
> +						 struct bloom_filter_settings *settings)

All right, this function name is certainly descriptive.  I wonder if it
wouldn't be better to use a shorter name, like maybe_different(), or
maybe_not_treesame(), so that the conditional using this function would
read naturally:

  if (!maybe_different(revs, commit, revs->bloom_key,
  		       revs->bloom_filter_settings))
  	return REV_TREE_SAME;

But this might be just a matter of taste.

> +{
> +	struct bloom_filter *filter;
> +
> +	if (!revs->repo->objects->commit_graph)
> +		return -1;
> +	if (commit->generation == GENERATION_NUMBER_INFINITY)
> +		return -1;

O.K., so we check that there is loaded commit graph, and that given
commit is in a commit graph, otherwise we return "no data".

I agree with distinguishing between "no data" value (which is <0, a
convention for denoting errors), and "maybe" value.  Currently this
distinction is not utilized, but it would help in the case of more than
one path given -- in "no data" case there is no need to check other
paths against non-existing Bloom filter.

> +	if (!key || !settings)
> +		return -1;

Why the check for non-NULL 'key' and 'settings' is the last check?

BTW. when it is possible for key or settings to be NULL?  Or is it just
defensive programming?

> +
> +	filter = get_bloom_filter(revs->repo, commit, 0);

O.K., we won't be recomputing Bloom filter if it is not present either
on slab (as "inside-out" auxiliary data for a commit), or in the
commit-graph (in Bloom filter chunks).

> +
> +	if (!filter || !filter->len)
> +		return 1;

Sidenote: bloom_filter_contains() would also indirectly check for
filter->len being zero.  Though this doesn't cost much.

Shouldn't !filter case return value of -1 i.e. "no data", rather than
return value of 1 i.e. "maybe"?

> +
> +	return bloom_filter_contains(filter, key, settings);
> +}

All right.

> +
>  static int rev_compare_tree(struct rev_info *revs,
> -			    struct commit *parent, struct commit *commit)
> +			    struct commit *parent, struct commit *commit, int nth_parent)
>  {
>  	struct tree *t1 = get_commit_tree(parent);
>  	struct tree *t2 = get_commit_tree(commit);
> +	int bloom_ret = 1;
>  
>  	if (!t1)
>  		return REV_TREE_NEW;
> @@ -653,6 +677,16 @@ static int rev_compare_tree(struct rev_info *revs,
>  			return REV_TREE_SAME;
>  	}
>  
> +	if (revs->pruning.pathspec.nr == 1 && !nth_parent) {

All right, Bloom filter stores information about changed paths with
respect to first-parent changes only, so if we are asking about is not a
first parent (where nth_parent == 0), we cannot use Bloom filter.

Currently we limit the check to single pathspec only; that is a good
start and a good simplification.

> +		bloom_ret = check_maybe_different_in_bloom_filter(revs,
> +								  commit,
> +								  revs->bloom_key,
> +								  revs->bloom_filter_settings);
> +
> +		if (bloom_ret == 0)
> +			return REV_TREE_SAME;

Pretty straightforward.

> +	}
> +
>  	tree_difference = REV_TREE_SAME;
>  	revs->pruning.flags.has_changes = 0;
>  	if (diff_tree_oid(&t1->object.oid, &t2->object.oid, "",
> @@ -855,7 +889,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  			die("cannot simplify commit %s (because of %s)",
>  			    oid_to_hex(&commit->object.oid),
>  			    oid_to_hex(&p->object.oid));
> -		switch (rev_compare_tree(revs, p, commit)) {
> +		switch (rev_compare_tree(revs, p, commit, nth_parent)) {

All right, we need to pass information about the index of the parent;
and we have just done that.  Good.

>  		case REV_TREE_SAME:
>  			if (!revs->simplify_history || !relevant_commit(p)) {
>  				/* Even if a merge with an uninteresting
> @@ -3342,6 +3376,33 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
>  	}
>  }
>  
> +static void prepare_to_use_bloom_filter(struct rev_info *revs)

All right, I see that pointers to bloom_key and bloom_filter_settings
were added to the rev_info struct.  I understand why the former is here,
but the latter seems to be there just as a shortcut (to not owned data),
which is fine but a bit strange.

Or is the latter here to allow for Bloom filter settings to possibly
change from commit-graph file in the chain to commit-graph file, and
thus from commit to commit?

> +{
> +	struct pathspec_item *pi;

Maybe 'pathspec' (or the like) instead of short and cryptic 'pi' would
be better a better name... unless 'pi' is used in other places already.

> +	const char *path;
> +	size_t len;
> +
> +	if (!revs->commits)
> +	    return;

When revs->commits may be NULL?  I understand that we need to have this
check because we use revs->commits->item next (sidenote: can revs ever
be NULL?).

Would `git log --all -- <path>` use Bloom filters (as it theoretically
could)?

> +
> +	parse_commit(revs->commits->item);

Why parsing first commit on the list of starting commits is needed here?
Please help me understand this line.

And shouldn't we use repo_parse_commit() here?

> +
> +	if (!revs->repo->objects->commit_graph)
> +		return;
> +
> +	revs->bloom_filter_settings = revs->repo->objects->commit_graph->settings;
> +	if (!revs->bloom_filter_settings)
> +		return;

All right, so if there is no commit graph, or the commit graph does not
include Bloom filter data, there is nothing to do.

Though I worry that it would make Git do not use Bloom filter if the top
commit-graph in the chain does not include Bloom filter data, while
other commit-graph files do (and Git could have used that information to
speed up the file history query).

> +
> +	pi = &revs->pruning.pathspec.items[0];
> +	path = pi->match;
> +	len = strlen(path);

Why not the following, if we do not do any checks for 'pi' value:

  +	path = &revs->pruning.pathspec.items[0]->match;

A question: is the path in the `match` field in `struct pathspec`
normalized with respect to trailing slash (for directories)?  Bloom
filter stores pathnames for directories without trailing slash.

What I mean is if, for example, both of those would use Bloom filter
data:

  $ git log -- Documentation/
  $ git log -- Documentation

> +
> +	load_bloom_filters();
> +	revs->bloom_key = xmalloc(sizeof(struct bloom_key));
> +	fill_bloom_key(path, len, revs->bloom_key, revs->bloom_filter_settings);

All right, looks good.

Though... do we leak revs->bloom_key, as should we worry about it?

> +}
> +
>  int prepare_revision_walk(struct rev_info *revs)
>  {
>  	int i;
> @@ -3391,6 +3452,8 @@ int prepare_revision_walk(struct rev_info *revs)
>  		simplify_merges(revs);
>  	if (revs->children.name)
>  		set_children(revs);
> +	if (revs->pruning.pathspec.nr == 1)
> +	    prepare_to_use_bloom_filter(revs);

Looks good.

Minor nitpick: 4 spaces instead of tab are used for indentation (or, to
be more exact a tab followed by 4 space, instead of two tabs):

  +	if (revs->pruning.pathspec.nr == 1)
  +		prepare_to_use_bloom_filter(revs);

>  	return 0;
>  }
>  
> diff --git a/revision.h b/revision.h
> index a1a804bd3d..65dc11e8f1 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -56,6 +56,8 @@ struct repository;
>  struct rev_info;
>  struct string_list;
>  struct saved_parents;
> +struct bloom_key;
> +struct bloom_filter_settings;
>  define_shared_commit_slab(revision_sources, char *);
>  
>  struct rev_cmdline_info {
> @@ -291,6 +293,9 @@ struct rev_info {
>  	struct revision_sources *sources;
>  
>  	struct topo_walk_info *topo_walk_info;
> +
> +	struct bloom_key *bloom_key;
> +	struct bloom_filter_settings *bloom_filter_settings;

It might be a good idea to add one-line comment above those newly
introduced fields.  The `struct rev_info` has many subsections of fields
described by such comments (or even block comments), like e.g.

  /* Starting list */
  /* Parents of shown commits */
  /* The end-points specified by the end user */
  /* excluding from --branches, --refs, etc. expansion */
  /* Traversal flags */
  /* diff info for patches and for paths limiting */

  /*
   * Whether the arguments parsed by setup_revisions() included any
   * "input" revisions that might still have yielded an empty pending
   * list (e.g., patterns like "--all" or "--glob").
   */

>  };
>  
>  int ref_excluded(struct string_list *, const char *path);
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> new file mode 100755
> index 0000000000..d42f077998
> --- /dev/null
> +++ b/t/t4216-log-bloom.sh
> @@ -0,0 +1,74 @@
> +#!/bin/sh
> +
> +test_description='git log for a path with bloom filters'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup repo' '
> +	git init &&
> +	git config core.commitGraph true &&
> +	git config gc.writeCommitGraph false &&
> +	infodir=".git/objects/info" &&
> +	graphdir="$infodir/commit-graphs" &&
> +	test_oid_init

Why do you use `test_oid_init`, when you are *not* using `test_oid`?
I guess it is because t5318-commit-graph.sh uses it, isn't it?

The `graphdir` shell variable is not used either.

> +'
> +
> +test_expect_success 'create 9 commits and repack' '
> +	test_commit c1 file1 &&
> +	test_commit c2 file2 &&
> +	test_commit c3 file3 &&
> +	test_commit c4 file1 &&
> +	test_commit c5 file2 &&
> +	test_commit c6 file3 &&
> +	test_commit c7 file1 &&
> +	test_commit c8 file2 &&
> +	test_commit c9 file3
> +'

Wouldn't it be better for this step to be a part of 'setup repo' step?
Anyway, the test name says '... and repack', but `git repack` is missing
(it should be done after last test_commit).

I think it would be good idea to test behavior of Bloom filters with
respect to directories, so at least one file should be in a subdirectory
(maybe even deeper in hierarchy).

We should also test the behavior with respect to merges, and when we are
not following the first parent.  But that might be a separate part of
this test.

> +
> +printf "c7\nc4\nc1" > expect_file1

Doing things outside test is discouraged.  We can create a separate test
that creates those expect_file* files, or it can be a part of 'create
commits' test.

Anyway, instead of doing test without Bloom filters (something that
should have been tested already by other parts of testsuite), and then
doing the same test with Bloom filter, why not compare that the result
without and with Bloom filter is the same.  The t5318-commit-graph.sh
test does this with help of graph_git_two_modes() function:

  graph_git_two_modes () {
  	git -c core.commitGraph=true  $1 >output
  	git -c core.commitGraph=false $1 >expect
  	test_cmp expect output
  }

Sidenote: I wonder if it is high time to create t/lib-commit-graph.sh
helper with, among others, this common function.

> +
> +test_expect_success 'log without bloom filters' '
> +	git log --pretty="format:%s"  -- file1 > actual &&

CodingGuidelines:57: Redirection operators should be written with space
before, but no space after them.  (Minor nitpick)

  +	git log --pretty="format:%s"  -- file1 >actual &&

> +	test_cmp expect_file1 actual
> +'
> +
> +printf "c8\nc7\nc5\nc4\nc2\nc1" > expect_file1_file2

  +printf "c8\nc7\nc5\nc4\nc2\nc1" >expect_file1_file2

> +
> +test_expect_success 'multi-path log without bloom filters' '
> +	git log --pretty="format:%s"  -- file1 file2 > actual &&

  +	git log --pretty="format:%s"  -- file1 file2 >actual &&

> +	test_cmp expect_file1_file2 actual
> +'
> +
> +graph_read_expect() {

CodingGuidelines:144: We prefer a space between the function name and
the parentheses, and no space inside the parentheses.  (Minor nitpick)

  +graph_read_expect () {

> +	OPTIONAL=""
> +	NUM_CHUNKS=5
> +	if test ! -z $2
> +	then
> +		OPTIONAL=" $2"
> +		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))

This should be either

  +		NUM_CHUNKS=$((5 + $(echo "$2" | wc -w)))

or more future proof

  +		NUM_CHUNKS=$(($NUM_CHUNKS + $(echo "$2" | wc -w)))

We got away with this bug because we were not using octopus merges, and
there were no optional core chunks, that is we never call
graph_read_expect with second parameter in this test.

> +	fi
> +	cat >expect <<- EOF

Why there is space between "<<-" and "EOF"?

> +	header: 43475048 1 1 $NUM_CHUNKS 0
> +	num_commits: $1
> +	chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data$OPTIONAL
> +	EOF
> +	test-tool read-graph >output &&
> +	test_cmp expect output
> +}
> +
> +test_expect_success 'write commit graph with bloom filters' '
> +	git commit-graph write --reachable --changed-paths &&
> +	test_path_is_file $infodir/commit-graph &&
> +	graph_read_expect "9"
> +'

All right, this is preparatory step for further tests.

> +
> +test_expect_success 'log using bloom filters' '
> +	git log --pretty="format:%s" -- file1 > actual &&
> +	test_cmp expect_file1 actual
> +'
> +
> +test_expect_success 'multi-path log using bloom filters' '
> +	git log --pretty="format:%s"  -- file1 file2 > actual &&
> +	test_cmp expect_file1_file2 actual
> +'

With graph_git_two_modes() it would be much simpler:

  +test_expect_success 'single-path log' '
  +	git_graph_two_modes "git log --pretty=format:%s -- file1"
  +'

  +test_expect_success 'multi-path log' '
  +	git_graph_two_modes "git log --pretty=format:%s -- file1 file2"
  +'

What is missing is:
1. checking that Git is actually _using_ Bloom filters
   (which might be difficult to do)
2. testing that Bloom filters work also for history of subdirectories
   e.g. "git log -- subdir/" and "git log -- subdir"; this would of
   course require adjusting setup step
3. testing specific behaviors, like "git log --all -- file1"
4. merges with history following second parent
5. commits with no changes and/or merges with no first-parent changes
6. commit with more than 512 changed files (marked as slow test,
   and perhaps created with fast-import interface, like bulk commit
   creation in test_commit_bulk)

> +
> +test_done

Best regards,
-- 
Jakub Narębski

next prev parent reply	other threads:[~2020-01-11  0:27 UTC|newest]

Thread overview: 159+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 22:05 [PATCH 0/9] [RFC] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget
2020-01-01 20:20   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 2/9] commit-graph: write changed paths bloom filters Garima Singh via GitGitGadget
2019-12-21 16:48   ` Philip Oakley
2020-01-06 18:44   ` Jakub Narebski
2020-01-13 19:48     ` Garima Singh
2019-12-20 22:05 ` [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-01-07 12:19   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 4/9] commit-graph: document bloom filter format Garima Singh via GitGitGadget
2020-01-07 14:46   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file Garima Singh via GitGitGadget
2020-01-07 16:01   ` Jakub Narebski
2020-01-14 15:14     ` Garima Singh
2019-12-20 22:05 ` [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Garima Singh via GitGitGadget
2020-01-08  0:32   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 7/9] commit-graph: reuse existing bloom filters during write Garima Singh via GitGitGadget
2020-01-09 19:12   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-01-11  0:27   ` Jakub Narebski [this message]
2020-01-15  0:08     ` Garima Singh
2019-12-20 22:05 ` [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Garima Singh via GitGitGadget
2020-01-11 19:56   ` Jakub Narebski
2020-01-15  0:55     ` Garima Singh
2019-12-20 22:14 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Junio C Hamano
2019-12-22  9:26 ` Christian Couder
2019-12-22  9:38   ` Jeff King
2020-01-01 12:04     ` Jakub Narebski
2019-12-22  9:30 ` Jeff King
2019-12-22  9:32   ` [PATCH 1/3] commit-graph: examine changed-path objects in pack order Jeff King
2019-12-27 14:51     ` Derrick Stolee
2019-12-29  6:12       ` Jeff King
2019-12-29  6:28         ` Jeff King
2019-12-30 14:37         ` Derrick Stolee
2019-12-30 14:51           ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 2/3] commit-graph: free large diffs, too Jeff King
2019-12-27 14:52     ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 3/3] commit-graph: stop using full rev_info for diffs Jeff King
2019-12-27 14:53     ` Derrick Stolee
2019-12-26 14:21   ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Derrick Stolee
2019-12-29  6:03     ` Jeff King
2019-12-27 16:11   ` Derrick Stolee
2019-12-29  6:24     ` Jeff King
2019-12-30 16:04       ` Derrick Stolee
2019-12-30 17:02       ` Junio C Hamano
2019-12-31 16:45 ` Jakub Narebski
2020-01-13 16:54   ` Garima Singh
2020-01-20 13:48     ` Jakub Narebski
2020-01-21 16:14       ` Garima Singh
2020-02-02 18:43         ` Jakub Narebski
2020-01-21 23:40 ` Emily Shaffer
2020-01-27 18:24   ` Garima Singh
2020-02-01 23:32   ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 00/11] " Garima Singh via GitGitGadget
2020-02-05 22:56   ` [PATCH v2 01/11] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-02-09 12:39     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-02-15 17:17     ` Jakub Narebski
2020-02-16 16:49     ` Jakub Narebski
2020-02-22  0:32       ` Garima Singh
2020-02-23 13:38         ` Jakub Narebski
2020-02-24 17:34           ` Garima Singh
2020-02-24 18:20             ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 03/11] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-02-17  0:00     ` Jakub Narebski
2020-02-22  0:37       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-02-17 21:56     ` Jakub Narebski
2020-02-22  0:55       ` Garima Singh
2020-02-23 17:34         ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-02-18 17:59     ` Jakub Narebski
2020-02-24 18:29       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 06/11] commit-graph: examine commits by generation number Derrick Stolee via GitGitGadget
2020-02-19  0:32     ` Jakub Narebski
2020-02-24 20:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 07/11] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-02-19 15:13     ` Jakub Narebski
2020-02-24 21:14       ` Garima Singh
2020-02-25 11:40         ` Jakub Narebski
2020-02-25 15:58           ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 08/11] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-02-20 18:48     ` Jakub Narebski
2020-02-24 21:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-02-20 20:28     ` Jakub Narebski
2020-02-24 21:51       ` Garima Singh
2020-02-25 12:10         ` Jakub Narebski
2020-02-20 22:10     ` Bryan Turner
2020-02-22  1:44       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 10/11] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-02-21 17:31     ` Jakub Narebski
2020-02-21 22:45     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 11/11] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-02-22  0:11     ` Jakub Narebski
2020-02-07 13:52   ` [PATCH v2 00/11] Changed Paths Bloom Filters SZEDER Gábor
2020-02-07 15:09     ` Garima Singh
2020-02-07 15:36       ` Derrick Stolee
2020-02-07 16:15         ` SZEDER Gábor
2020-02-07 16:33           ` Derrick Stolee
2020-02-11 19:08       ` Garima Singh
2020-02-08 23:04   ` Jakub Narebski
2020-02-21 17:41     ` Garima Singh
2020-03-29 18:36       ` Junio C Hamano
2020-03-30  0:31   ` [PATCH v3 00/16] " Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 01/16] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 02/16] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 03/16] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 04/16] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 05/16] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 06/16] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 07/16] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 08/16] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 09/16] diff: skip batch object download when possible Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 10/16] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 11/16] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 12/16] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 13/16] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 14/16] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 15/16] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 16/16] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-06 16:59     ` [PATCH v4 00/15] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 01/15] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 02/15] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 03/15] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 04/15] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-06-27 15:53         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 05/15] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-08-04 14:47         ` SZEDER Gábor
2020-08-04 16:25           ` Derrick Stolee
2020-08-04 17:00             ` SZEDER Gábor
2020-08-04 17:31               ` Derrick Stolee
2020-08-05 17:08                 ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 06/15] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 07/15] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 08/15] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-05-29  8:57         ` SZEDER Gábor
2020-05-29 13:35           ` Derrick Stolee
2020-05-31 17:23             ` SZEDER Gábor
2020-07-09 17:00         ` [PATCH] commit-graph: fix "Writing out commit graph" progress counter SZEDER Gábor
2020-07-09 18:01           ` Derrick Stolee
2020-07-09 18:20             ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 10/15] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-06-19 14:02         ` SZEDER Gábor
2020-06-19 19:28           ` Junio C Hamano
2020-07-27 21:33         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 11/15] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-06-07 22:21         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 12/15] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-06-26  6:34         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 13/15] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 14/15] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 15/15] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-08 15:51       ` [PATCH v4 00/15] Changed Paths Bloom Filters Derrick Stolee
2020-04-08 19:21         ` Junio C Hamano
2020-04-08 20:05         ` Jakub Narębski
2020-04-12 20:34         ` Taylor Blau
2020-03-05 19:49 ` [PATCH 0/9] [RFC] " Garima Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86d0bqsuqc.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=garima.singh@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).