git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Garima Singh via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Derrick Stolee" <stolee@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Taylor Blau" <me@ttaylorr.com>, "Jeff King" <peff@peff.net>,
	"Garima Singh" <garimasigit@gmail.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Emily Shaffer" <emilyshaffer@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Garima Singh" <garima.singh@microsoft.com>
Subject: Re: [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand
Date: Thu, 20 Feb 2020 21:28:06 +0100	[thread overview]
Message-ID: <86y2sxdmw9.fsf@gmail.com> (raw)
In-Reply-To: <3d7ee0c96955dc15c87d04982d8cdec8b62750b2.1580943390.git.gitgitgadget@gmail.com> (Garima Singh via GitGitGadget's message of "Wed, 05 Feb 2020 22:56:28 +0000")

"Garima Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Garima Singh <garima.singh@microsoft.com>
>
> Add --changed-paths option to git commit-graph write. This option will
> allow users to compute information about the paths that have changed
> between a commit and its first parent, and write it into the commit graph
> file. If the option is passed to the write subcommand we set the
> COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the
> commit-graph logic.

In the manpage you write that this operation (computing Bloom filters)
can take a while on large repositories.  Could you perhaps provide some
numbers: how much longer does it take to write commit-graph file with
and without '--changed-paths' for example for Linux kernel, or some
other large repository?  Thanks in advance.

>
> Helped-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Garima Singh <garima.singh@microsoft.com>
> ---
>  Documentation/git-commit-graph.txt | 5 +++++
>  builtin/commit-graph.c             | 9 +++++++--
>  2 files changed, 12 insertions(+), 2 deletions(-)

What is missing is some sanity tests: that bloom index and bloom data
chunks are not present without '--changed-paths', and that they are
added with '--changed-paths'.

If possible, maybe also check in a separate test that the size of
bloom_index chunk agrees with the number of commits in the commit graph.


Also, we can now add those tests I have wrote about in my review of
previous patch, that is:

1. If you write commit-graph with --changed-paths, and either add some
   commits later or exclude some commits from the commit graph, then:

   a.) commit(s) in commit-graph have Bloom filter
   b.) commit(s) not in commit-graph do not have Bloom filter

2. If you write commit-graph without --changed-paths as base layer,
   and then write next layer with --changed-paths and --split, then:

   a.) commit(s) in top layer have Bloom filter(s)
   b.) commit(s) in bottom layer don't have Bloom filter(s)

>
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index bcd85c1976..907d703b30 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -54,6 +54,11 @@ or `--stdin-packs`.)
>  With the `--append` option, include all commits that are present in the
>  existing commit-graph file.
>  +
> +With the `--changed-paths` option, compute and write information about the
> +paths changed between a commit and it's first parent. This operation can
> +take a while on large repositories. It provides significant performance gains
> +for getting history of a directory or a file with `git log -- <path>`.
> ++

Should we write about limitation that the topmost layer in the split
commit graph needs to be written with '--changed-paths' for Git to use
this information?  Or perhaps we should try (in the future) to remove
this limitation??

>  With the `--split` option, write the commit-graph as a chain of multiple
>  commit-graph files stored in `<dir>/info/commit-graphs`. The new commits
>  not already in the commit-graph are added in a new "tip" file. This file
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index e0c6fc4bbf..261dcce091 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -9,7 +9,7 @@
>  
>  static char const * const builtin_commit_graph_usage[] = {
>  	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
> -	N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"),
> +	N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] <split options>"),
>  	NULL
>  };
>  
> @@ -19,7 +19,7 @@ static const char * const builtin_commit_graph_verify_usage[] = {
>  };
>  
>  static const char * const builtin_commit_graph_write_usage[] = {
> -	N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"),
> +	N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] <split options>"),
>  	NULL
>  };
>

All right.

> @@ -32,6 +32,7 @@ static struct opts_commit_graph {
>  	int split;
>  	int shallow;
>  	int progress;
> +	int enable_changed_paths;

Bikeshed painting: should this field be called enable_changed_paths or
simply changed_paths?

>  } opts;
>  
>  static int graph_verify(int argc, const char **argv)
> @@ -110,6 +111,8 @@ static int graph_write(int argc, const char **argv)
>  			N_("start walk at commits listed by stdin")),
>  		OPT_BOOL(0, "append", &opts.append,
>  			N_("include all commits already in the commit-graph file")),
> +		OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths,
> +			N_("enable computation for changed paths")),
>  		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
>  		OPT_BOOL(0, "split", &opts.split,
>  			N_("allow writing an incremental commit-graph file")),

All right.

> @@ -143,6 +146,8 @@ static int graph_write(int argc, const char **argv)
>  		flags |= COMMIT_GRAPH_WRITE_SPLIT;
>  	if (opts.progress)
>  		flags |= COMMIT_GRAPH_WRITE_PROGRESS;
> +	if (opts.enable_changed_paths)
> +		flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS;
>  
>  	read_replace_refs = 0;

All right.  This actually turns on calculation Bloom filters for changed
paths, thanks to

 	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;

that was added by the "[PATCH v2 04/11] commit-graph: compute Bloom
filters for changed paths" patch.

Though... should this enabling be split into two separate patches like
this?


Best,
-- 
Jakub Narębski

  reply	other threads:[~2020-02-20 20:28 UTC|newest]

Thread overview: 159+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 22:05 [PATCH 0/9] [RFC] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget
2020-01-01 20:20   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 2/9] commit-graph: write changed paths bloom filters Garima Singh via GitGitGadget
2019-12-21 16:48   ` Philip Oakley
2020-01-06 18:44   ` Jakub Narebski
2020-01-13 19:48     ` Garima Singh
2019-12-20 22:05 ` [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-01-07 12:19   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 4/9] commit-graph: document bloom filter format Garima Singh via GitGitGadget
2020-01-07 14:46   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file Garima Singh via GitGitGadget
2020-01-07 16:01   ` Jakub Narebski
2020-01-14 15:14     ` Garima Singh
2019-12-20 22:05 ` [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Garima Singh via GitGitGadget
2020-01-08  0:32   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 7/9] commit-graph: reuse existing bloom filters during write Garima Singh via GitGitGadget
2020-01-09 19:12   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-01-11  0:27   ` Jakub Narebski
2020-01-15  0:08     ` Garima Singh
2019-12-20 22:05 ` [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Garima Singh via GitGitGadget
2020-01-11 19:56   ` Jakub Narebski
2020-01-15  0:55     ` Garima Singh
2019-12-20 22:14 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Junio C Hamano
2019-12-22  9:26 ` Christian Couder
2019-12-22  9:38   ` Jeff King
2020-01-01 12:04     ` Jakub Narebski
2019-12-22  9:30 ` Jeff King
2019-12-22  9:32   ` [PATCH 1/3] commit-graph: examine changed-path objects in pack order Jeff King
2019-12-27 14:51     ` Derrick Stolee
2019-12-29  6:12       ` Jeff King
2019-12-29  6:28         ` Jeff King
2019-12-30 14:37         ` Derrick Stolee
2019-12-30 14:51           ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 2/3] commit-graph: free large diffs, too Jeff King
2019-12-27 14:52     ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 3/3] commit-graph: stop using full rev_info for diffs Jeff King
2019-12-27 14:53     ` Derrick Stolee
2019-12-26 14:21   ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Derrick Stolee
2019-12-29  6:03     ` Jeff King
2019-12-27 16:11   ` Derrick Stolee
2019-12-29  6:24     ` Jeff King
2019-12-30 16:04       ` Derrick Stolee
2019-12-30 17:02       ` Junio C Hamano
2019-12-31 16:45 ` Jakub Narebski
2020-01-13 16:54   ` Garima Singh
2020-01-20 13:48     ` Jakub Narebski
2020-01-21 16:14       ` Garima Singh
2020-02-02 18:43         ` Jakub Narebski
2020-01-21 23:40 ` Emily Shaffer
2020-01-27 18:24   ` Garima Singh
2020-02-01 23:32   ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 00/11] " Garima Singh via GitGitGadget
2020-02-05 22:56   ` [PATCH v2 01/11] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-02-09 12:39     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-02-15 17:17     ` Jakub Narebski
2020-02-16 16:49     ` Jakub Narebski
2020-02-22  0:32       ` Garima Singh
2020-02-23 13:38         ` Jakub Narebski
2020-02-24 17:34           ` Garima Singh
2020-02-24 18:20             ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 03/11] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-02-17  0:00     ` Jakub Narebski
2020-02-22  0:37       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-02-17 21:56     ` Jakub Narebski
2020-02-22  0:55       ` Garima Singh
2020-02-23 17:34         ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-02-18 17:59     ` Jakub Narebski
2020-02-24 18:29       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 06/11] commit-graph: examine commits by generation number Derrick Stolee via GitGitGadget
2020-02-19  0:32     ` Jakub Narebski
2020-02-24 20:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 07/11] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-02-19 15:13     ` Jakub Narebski
2020-02-24 21:14       ` Garima Singh
2020-02-25 11:40         ` Jakub Narebski
2020-02-25 15:58           ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 08/11] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-02-20 18:48     ` Jakub Narebski
2020-02-24 21:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-02-20 20:28     ` Jakub Narebski [this message]
2020-02-24 21:51       ` Garima Singh
2020-02-25 12:10         ` Jakub Narebski
2020-02-20 22:10     ` Bryan Turner
2020-02-22  1:44       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 10/11] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-02-21 17:31     ` Jakub Narebski
2020-02-21 22:45     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 11/11] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-02-22  0:11     ` Jakub Narebski
2020-02-07 13:52   ` [PATCH v2 00/11] Changed Paths Bloom Filters SZEDER Gábor
2020-02-07 15:09     ` Garima Singh
2020-02-07 15:36       ` Derrick Stolee
2020-02-07 16:15         ` SZEDER Gábor
2020-02-07 16:33           ` Derrick Stolee
2020-02-11 19:08       ` Garima Singh
2020-02-08 23:04   ` Jakub Narebski
2020-02-21 17:41     ` Garima Singh
2020-03-29 18:36       ` Junio C Hamano
2020-03-30  0:31   ` [PATCH v3 00/16] " Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 01/16] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 02/16] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 03/16] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 04/16] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 05/16] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 06/16] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 07/16] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 08/16] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 09/16] diff: skip batch object download when possible Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 10/16] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 11/16] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 12/16] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 13/16] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 14/16] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 15/16] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 16/16] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-06 16:59     ` [PATCH v4 00/15] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 01/15] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 02/15] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 03/15] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 04/15] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-06-27 15:53         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 05/15] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-08-04 14:47         ` SZEDER Gábor
2020-08-04 16:25           ` Derrick Stolee
2020-08-04 17:00             ` SZEDER Gábor
2020-08-04 17:31               ` Derrick Stolee
2020-08-05 17:08                 ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 06/15] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 07/15] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 08/15] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-05-29  8:57         ` SZEDER Gábor
2020-05-29 13:35           ` Derrick Stolee
2020-05-31 17:23             ` SZEDER Gábor
2020-07-09 17:00         ` [PATCH] commit-graph: fix "Writing out commit graph" progress counter SZEDER Gábor
2020-07-09 18:01           ` Derrick Stolee
2020-07-09 18:20             ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 10/15] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-06-19 14:02         ` SZEDER Gábor
2020-06-19 19:28           ` Junio C Hamano
2020-07-27 21:33         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 11/15] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-06-07 22:21         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 12/15] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-06-26  6:34         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 13/15] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 14/15] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 15/15] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-08 15:51       ` [PATCH v4 00/15] Changed Paths Bloom Filters Derrick Stolee
2020-04-08 19:21         ` Junio C Hamano
2020-04-08 20:05         ` Jakub Narębski
2020-04-12 20:34         ` Taylor Blau
2020-03-05 19:49 ` [PATCH 0/9] [RFC] " Garima Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86y2sxdmw9.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=emilyshaffer@gmail.com \
    --cc=garima.singh@microsoft.com \
    --cc=garimasigit@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).