From: Jeff King <peff@peff.net>
To: Martin Langhoff <martin.langhoff@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git log exclude pathspec from file - supported? plans?
Date: Wed, 30 Jun 2021 13:58:08 -0400 [thread overview]
Message-ID: <YNywsEbFcrQFeH91@coredump.intra.peff.net> (raw)
In-Reply-To: <CACPiFCLXxwaWOuR6sy8H4hCG-H0ZFvVYma7COfDq3zxoUt=VtA@mail.gmail.com>
On Wed, Jun 30, 2021 at 12:59:43PM -0400, Martin Langhoff wrote:
> long time no see! I'm doing some complex git repo spelunking and
> pushing the boundaries of the pathspec magic for excludes.
>
> Is there a reasonable way to provide a (potentially large) set of
> excludes? something like
>
> git log --exclude-pathspec-file paths-to-exclude.txt .
>
> Has there been discussion / patches / plans related to this? I may
> have some cycles (hopefully!)
You can feed pathspecs via --stdin. So:
{
echo "--"
sed s/^/:^/ paths-to-exclude.txt
} | git log --stdin
works. Obviously it's not as turn-key if you really do have a list of
paths in a file already, but it's much more flexible.
I'll caution you that the pathspec code is not well-optimized to handle
a large number of pathspecs. E.g.:
[no pathspecs]
$ time git rev-list HEAD /dev/null
real 0m0.033s
user 0m0.017s
sys 0m0.017s
[trivial pathspec; now we have to actually open up trees]
$ { echo --; echo .; } >input
$ time git rev-list HEAD --stdin <input >/dev/null
real 0m1.338s
user 0m1.294s
sys 0m0.045s
[lots of pathspecs; now we spend loads of time actually matching
strings; the ^C is when I got bored and killed it]
$ { echo --; git ls-files; } >input
$ time git rev-list HEAD --stdin <input >/dev/null
^C
real 1m24.406s
user 1m24.369s
sys 0m0.036s
The problem is that we try to linearly match every pathspec against
every path we consider, so it's quadratic-ish in the number of files in
the repo. I played a long time ago with storing non-wildcard pathspecs
in a trie that we could traverse as we talked the individual trees we
were matching. It performed well, but IIRC the interface was hacky (I
had to bolt it specifically onto the way the tree-walker uses
pathspecs, and the other pathspec matchers didn't benefit at all).
I can probably dig it up if anybody's interested in looking at it.
-Peff
next prev parent reply other threads:[~2021-06-30 17:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACPiFCLtj5QF6_Goc5UYh9KHWgkrKtjApL-cCH04S5gdTFyk7Q@mail.gmail.com>
2021-06-30 16:59 ` git log exclude pathspec from file - supported? plans? Martin Langhoff
2021-06-30 17:58 ` Jeff King [this message]
2021-06-30 18:22 ` Ævar Arnfjörð Bjarmason
2021-07-01 21:27 ` Jeff King
2021-07-01 21:30 ` [PATCH 1/3] pathspec: add optional trie index Jeff King
2021-07-01 21:30 ` [PATCH 2/3] pathspec: turn on tries when appropriate Jeff King
2021-07-01 21:36 ` [PATCH 3/3] tree-diff: use pathspec tries Jeff King
2021-07-01 21:43 ` git log exclude pathspec from file - supported? plans? Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YNywsEbFcrQFeH91@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=martin.langhoff@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).