git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>, git@vger.kernel.org
Cc: vdye@github.com
Subject: Re: [PATCH v1 2/2] builtin/grep.c: integrate with sparse index
Date: Wed, 17 Aug 2022 10:23:15 -0400	[thread overview]
Message-ID: <19dea639-389a-7258-e424-4912bde226df@github.com> (raw)
In-Reply-To: <20220817075633.217934-3-shaoxuan.yuan02@gmail.com>

On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
> Turn on sparse index and remove ensure_full_index().
> 
> Change it to only expands the index when using --sparse.
> 
> The p2000 tests demonstrate a ~99.4% execution time reduction for
> `git grep` using a sparse index.
> 
> Test                                           HEAD~1       HEAD
> -----------------------------------------------------------------------------
> 2000.78: git grep --cached bogus (full-v3)     0.019        0.018  (-5.2%)
> 2000.79: git grep --cached bogus (full-v4)     0.017        0.016  (-5.8%)
> 2000.80: git grep --cached bogus (sparse-v3)   0.29         0.0015 (-99.4%)
> 2000.81: git grep --cached bogus (sparse-v4)   0.30         0.0018 (-99.4%)

Good results.

I think we could get interesting results even with the --sparse
option if you go another step further (perhaps as a patch after
this one).

> 
> Optional reading about performance test results
> -----------------------------------------------
> Notice that because `git-grep` needs to parse blobs in the index, the
> index reading time is minuscule comparing to the object parsing time.
> And because of this, the p2000 test results cannot clearly reflect the
> speedup for index reading: combining with the object parsing time,
> the aggregated time difference is extremely close between HEAD~1 and
> HEAD.
> 
> Hence, the results presenting here are not directly extracted from the
> p2000 test results. Instead, to make the performance difference more
> visible, the test command is manually ran with GIT_TRACE2_PERF in the
> four repos (full-v3, sparse-v3, full-v4, sparse-v4). The numbers here
> are then extracted from the time difference between "region_enter" and
> "region_leave" of label "do_read_index".

This is a good point, but I don't recommend displaying them as if they
were the output of a "./run HEAD~1 HEAD -- p2000-sparse-operations.sh"
command. Instead, point out that the performance test does not show a
major improvement and instead you have these "Before" and "After" results
from testing manually and extracting trace2 regions.

> @@ -519,11 +519,15 @@ static int grep_cache(struct grep_opt *opt,
>  		strbuf_addstr(&name, repo->submodule_prefix);
>  	}
>  
> +	prepare_repo_settings(repo);
> +	repo->settings.command_requires_full_index = 0;
> +

The best pattern is to put this in cmd_grep() immediately after parsing
options. This guarantees that we don't parse and expand the index in any
other code path.

>  	if (repo_read_index(repo) < 0)
>  		die(_("index file corrupt"));
>  
> -	/* TODO: audit for interaction with sparse-index. */
> -	ensure_full_index(repo->index);
> +	if (grep_sparse)
> +		ensure_full_index(repo->index);
> +

As mentioned before, this approach is the simplest way to make the case
without --sparse faster, but the case _with_ --sparse will still be slow.
The way to fix this would be to modify this portion of the loop:

	if (S_ISREG(ce->ce_mode) &&
	    match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
			   S_ISDIR(ce->ce_mode) ||
			   S_ISGITLINK(ce->ce_mode))) {

by adding an initial case

	if (S_ISSPARSEDIR(ce->ce_mode)) {
		hit |= grep_tree(opt, &ce->oid, name.buf, 0, name.buf);
	} else if (S_ISREG(ce->ce_mode) &&
		   match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL,
				  S_ISDIR(ce->ce_mode) ||
				  S_ISGITLINK(ce->ce_mode))) {

and appropriately implement "grep_tree()" to walk the tree at ce->oid to
find all matching files within, then call grep_oid() for each of those
paths.

Bonus points if you recognize that the pathspec uses prefix checks that
allow pruning the search space and not parsing all of the trees
recursively. But that can definitely be delayed for a future enhancement.

> +test_expect_success 'grep expands index using --sparse' '
> +	init_repos &&
> +
> +	# With --sparse and --cached, do not ignore sparse entries and
> +	# expand the index.
> +	test_all_match git grep --sparse --cached a
> +'

Here, you're testing that the behavior matches, but not testing that the
index expands. (It does describe why you didn't include it in the later
ensure_not_expanded tests.)

> +
> +test_expect_success 'grep is not expanded' '
> +	init_repos &&
> +
> +	ensure_not_expanded grep a &&
> +	ensure_not_expanded grep a -- deep/* &&
> +	# grep does not match anything per se, so ! is used

It can be helpful to say why:

	# All files within the folder1/* pathspec are sparse,
	# so this command does not find any matches.

> +	ensure_not_expanded ! grep a -- folder1/*
> +'

Thanks,
-Stolee

  reply	other threads:[~2022-08-17 14:24 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee [this message]
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19dea639-389a-7258-e424-4912bde226df@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).