mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <>
To: Jeff King <>
Cc: Martin Langhoff <>,
	Git Mailing List <>,
	Taylor Blau <>
Subject: Re: git log exclude pathspec from file - supported? plans?
Date: Wed, 30 Jun 2021 20:22:35 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Jun 30 2021, Jeff King wrote:

> On Wed, Jun 30, 2021 at 12:59:43PM -0400, Martin Langhoff wrote:
>> long time no see! I'm doing some complex git repo spelunking and
>> pushing the boundaries of the pathspec magic for excludes.
>> Is there a reasonable way to provide a (potentially large) set of
>> excludes? something like
>>      git log --exclude-pathspec-file paths-to-exclude.txt .
>> Has there been discussion / patches / plans related to this? I may
>> have some cycles (hopefully!)
> You can feed pathspecs via --stdin. So:
>   {
> 	echo "--"
> 	sed s/^/:^/ paths-to-exclude.txt
>   } | git log --stdin
> works. Obviously it's not as turn-key if you really do have a list of
> paths in a file already, but it's much more flexible.
> I'll caution you that the pathspec code is not well-optimized to handle
> a large number of pathspecs. E.g.:
>   [no pathspecs]
>   $ time git rev-list HEAD /dev/null
>   real	0m0.033s
>   user	0m0.017s
>   sys	0m0.017s
>   [trivial pathspec; now we have to actually open up trees]
>   $ { echo --; echo .; } >input
>   $ time git rev-list HEAD --stdin <input >/dev/null
>   real	0m1.338s
>   user	0m1.294s
>   sys	0m0.045s
>   [lots of pathspecs; now we spend loads of time actually matching
>    strings; the ^C is when I got bored and killed it]
>   $ { echo --; git ls-files; } >input
>   $ time git rev-list HEAD --stdin <input >/dev/null
>   ^C
>   real	1m24.406s
>   user	1m24.369s
>   sys	0m0.036s
> The problem is that we try to linearly match every pathspec against
> every path we consider, so it's quadratic-ish in the number of files in
> the repo. I played a long time ago with storing non-wildcard pathspecs
> in a trie that we could traverse as we talked the individual trees we
> were matching. It performed well, but IIRC the interface was hacky (I
> had to bolt it specifically onto the way the tree-walker uses
> pathspecs, and the other pathspec matchers didn't benefit at all).
> I can probably dig it up if anybody's interested in looking at it.

If it's not too much trouble I'd find it interesting, but I likely won't
do anything with it any time soon.

One of the PCREv2 experiments I had very early WIP work towards was to
create a search index for commit messages, contents etc. and stick it in
something similar to the --changed-paths part of the commit-graph.

The PCREv2 codebase actually has (supposedly) a bug-for-bug compatible
implementation of our wildmatch function as a translator to a PCREv2
regex, I have a brnch somewhere where we run all our wildmatch tests
against it successfully.

So couple that with regex introspection, and a search index that
e.g. creates a trie bloom filter, then as long as your --grep=<RX>,
-G<RX> or pathspec has at least 3 fixed strings among its wildcards we
can ask the bloom filter "is this commit a candidate for this regex
searching this path/commit message/diff/whatever".

So you can have indexed matches for things like '*/", not
just prefixes or fixed-strings.

  reply	other threads:[~2021-06-30 18:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <>
2021-06-30 16:59 ` git log exclude pathspec from file - supported? plans? Martin Langhoff
2021-06-30 17:58   ` Jeff King
2021-06-30 18:22     ` Ævar Arnfjörð Bjarmason [this message]
2021-07-01 21:27       ` Jeff King
2021-07-01 21:30         ` [PATCH 1/3] pathspec: add optional trie index Jeff King
2021-07-01 21:30         ` [PATCH 2/3] pathspec: turn on tries when appropriate Jeff King
2021-07-01 21:36         ` [PATCH 3/3] tree-diff: use pathspec tries Jeff King
2021-07-01 21:43         ` git log exclude pathspec from file - supported? plans? Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).