git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: git log exclude pathspec from file - supported? plans?
Date: Wed, 30 Jun 2021 20:22:35 +0200	[thread overview]
Message-ID: <87sg0zdx7z.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YNywsEbFcrQFeH91@coredump.intra.peff.net>


On Wed, Jun 30 2021, Jeff King wrote:

> On Wed, Jun 30, 2021 at 12:59:43PM -0400, Martin Langhoff wrote:
>
>> long time no see! I'm doing some complex git repo spelunking and
>> pushing the boundaries of the pathspec magic for excludes.
>> 
>> Is there a reasonable way to provide a (potentially large) set of
>> excludes? something like
>> 
>>      git log --exclude-pathspec-file paths-to-exclude.txt .
>> 
>> Has there been discussion / patches / plans related to this? I may
>> have some cycles (hopefully!)
>
> You can feed pathspecs via --stdin. So:
>
>   {
> 	echo "--"
> 	sed s/^/:^/ paths-to-exclude.txt
>   } | git log --stdin
>
> works. Obviously it's not as turn-key if you really do have a list of
> paths in a file already, but it's much more flexible.
>
> I'll caution you that the pathspec code is not well-optimized to handle
> a large number of pathspecs. E.g.:
>
>   [no pathspecs]
>   $ time git rev-list HEAD /dev/null
>   real	0m0.033s
>   user	0m0.017s
>   sys	0m0.017s
>
>   [trivial pathspec; now we have to actually open up trees]
>   $ { echo --; echo .; } >input
>   $ time git rev-list HEAD --stdin <input >/dev/null
>   real	0m1.338s
>   user	0m1.294s
>   sys	0m0.045s
>
>   [lots of pathspecs; now we spend loads of time actually matching
>    strings; the ^C is when I got bored and killed it]
>   $ { echo --; git ls-files; } >input
>   $ time git rev-list HEAD --stdin <input >/dev/null
>   ^C
>   real	1m24.406s
>   user	1m24.369s
>   sys	0m0.036s
>
> The problem is that we try to linearly match every pathspec against
> every path we consider, so it's quadratic-ish in the number of files in
> the repo. I played a long time ago with storing non-wildcard pathspecs
> in a trie that we could traverse as we talked the individual trees we
> were matching. It performed well, but IIRC the interface was hacky (I
> had to bolt it specifically onto the way the tree-walker uses
> pathspecs, and the other pathspec matchers didn't benefit at all).
>
> I can probably dig it up if anybody's interested in looking at it.

If it's not too much trouble I'd find it interesting, but I likely won't
do anything with it any time soon.

One of the PCREv2 experiments I had very early WIP work towards was to
create a search index for commit messages, contents etc. and stick it in
something similar to the --changed-paths part of the commit-graph.

The PCREv2 codebase actually has (supposedly) a bug-for-bug compatible
implementation of our wildmatch function as a translator to a PCREv2
regex, I have a brnch somewhere where we run all our wildmatch tests
against it successfully.

So couple that with regex introspection, and a search index that
e.g. creates a trie bloom filter, then as long as your --grep=<RX>,
-G<RX> or pathspec has at least 3 fixed strings among its wildcards we
can ask the bloom filter "is this commit a candidate for this regex
searching this path/commit message/diff/whatever".

So you can have indexed matches for things like '*/test-lib.sh", not
just prefixes or fixed-strings.

  reply	other threads:[~2021-06-30 18:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACPiFCLtj5QF6_Goc5UYh9KHWgkrKtjApL-cCH04S5gdTFyk7Q@mail.gmail.com>
2021-06-30 16:59 ` git log exclude pathspec from file - supported? plans? Martin Langhoff
2021-06-30 17:58   ` Jeff King
2021-06-30 18:22     ` Ævar Arnfjörð Bjarmason [this message]
2021-07-01 21:27       ` Jeff King
2021-07-01 21:30         ` [PATCH 1/3] pathspec: add optional trie index Jeff King
2021-07-01 21:30         ` [PATCH 2/3] pathspec: turn on tries when appropriate Jeff King
2021-07-01 21:36         ` [PATCH 3/3] tree-diff: use pathspec tries Jeff King
2021-07-01 21:43         ` git log exclude pathspec from file - supported? plans? Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg0zdx7z.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.langhoff@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).