From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH 0/7] rev-parse: implement object type filter
Date: Wed, 10 Mar 2021 16:39:22 -0500 [thread overview]
Message-ID: <YEk8iiDf/FMxzhIF@coredump.intra.peff.net> (raw)
In-Reply-To: <cover.1614600555.git.ps@pks.im>
On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:
> Altogether, this ends up with the following queries, both of which have
> been executed in a well-packed linux.git repository:
>
> # Previous query which uses object names as a heuristic to filter
> # non-blob objects, which bars us from using bitmap indices because
> # they cannot print paths.
> $ time git rev-list --objects --filter=blob:limit=200 \
> --object-names --all | sed -r '/^.{,41}$/d' | wc -l
> 4502300
>
> real 1m23.872s
> user 1m30.076s
> sys 0m6.002s
>
> # New query.
> $ time git rev-list --objects --filter-provided \
> --filter=object:type=blob --filter=blob:limit=200 \
> --use-bitmap-index --all | wc -l
> 22585
>
> real 0m19.216s
> user 0m16.768s
> sys 0m2.450s
Those produce very different answers. I guess because in the first one,
you still have a bunch of tree objects, too. You'd do much better to get
the actual types from cat-file, and filter on that. That also lets you
use bitmaps for the traversal portion. E.g.:
$ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
perl -lne 'print $1 if /^blob (.*)/' | wc -l
14966
real 0m6.248s
user 0m7.810s
sys 0m0.440s
which is faster than what you showed above (this is on linux.git, but my
result is different; maybe you have more refs than me?). But we should
be able to do better purely internally, so I suspect my computer is just
faster (or maybe your extra refs just aren't well-covered by bitmaps).
Running with your patches I get:
$ time git rev-list --objects --use-bitmap-index --all \
--filter-provided --filter=object:type=blob \
--filter=blob:limit=200 | wc -l
16339
real 0m1.309s
user 0m1.234s
sys 0m0.079s
which is indeed faster. It's quite curious that the answer is not the
same, though! I think yours has some bugs. If I sort and diff the
results, I see some commits mentioned in the output. Perhaps this is
--filter-provided not working, as they all seem to be ref tips.
> To be able to more efficiently answer this query, I've implemented
> multiple things:
>
> - A new object type filter `--filter=object:type=<type>` for
> git-rev-list(1), which is implemented both for normal graph walks and
> for the packfile bitmap index.
>
> - Given that above usecase requires two filters (the object type
> and blob size filters), bitmap filters were extended to support
> combined filters.
That's probably reasonable, especially because it lets us use bitmaps. I
do have a dream that we'll eventually be able to support more extensive
formatting via log/rev-list, which would allow:
git rev-list --use-bitmap-index --objects --all \
--format=%(objecttype) %(objectname) |
perl -ne 'print $1 if /^blob (.*)/'
That should be faster than the separate cat-file (which has to re-lookup
each object, in addition to the extra pipe overhead), but I expect the
--filter solution should always be faster still, as it can very quickly
eliminate the majority of the objects at the bitmap level.
> - git-rev-list(1) doesn't filter user-provided objects and always prints
> them. I don't want the listed commits though and only their referenced
> potential LFS blobs. So I've added a new flag `--filter-provided`
> which marks all provided objects as not-user-provided such that they
> get filtered the same as all the other objects.
Yeah, this "user-provided" behavior was quite a surprise to me when I
started implementing the bitmap versions of the existing filters. It's
nice to have the option to specify which you want.
-Peff
next prev parent reply other threads:[~2021-03-10 21:40 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 2/7] list-objects: move tag processing into its own function Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 3/7] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 4/7] list-objects: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 5/7] pack-bitmap: " Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 6/7] pack-bitmap: implement combined filter Patrick Steinhardt
2021-03-01 12:21 ` [PATCH 7/7] rev-list: allow filtering of provided items Patrick Steinhardt
2021-03-10 21:39 ` Jeff King [this message]
2021-03-11 14:38 ` [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-11 17:54 ` Jeff King
2021-03-15 11:25 ` Patrick Steinhardt
2021-03-10 21:58 ` Taylor Blau
2021-03-10 22:19 ` Jeff King
2021-03-11 14:43 ` Patrick Steinhardt
2021-03-11 17:56 ` Jeff King
2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
2021-03-15 13:14 ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-06 17:17 ` Jeff King
2021-03-15 13:14 ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-06 17:30 ` Jeff King
2021-04-09 10:19 ` Patrick Steinhardt
2021-03-15 13:14 ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-06 17:39 ` Jeff King
2021-03-15 13:14 ` [PATCH v2 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-15 13:14 ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-06 17:42 ` Jeff King
2021-03-15 13:14 ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-06 17:48 ` Jeff King
2021-03-15 13:14 ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-06 17:54 ` Jeff King
2021-04-09 10:31 ` Patrick Steinhardt
2021-04-09 15:53 ` Jeff King
2021-04-09 11:17 ` Patrick Steinhardt
2021-04-09 15:55 ` Jeff King
2021-03-15 13:15 ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-06 18:04 ` Jeff King
2021-04-09 10:59 ` Patrick Steinhardt
2021-04-09 15:58 ` Jeff King
2021-03-20 21:10 ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-06 18:08 ` Jeff King
2021-04-09 11:14 ` Patrick Steinhardt
2021-04-09 16:05 ` Jeff King
2021-04-09 11:27 ` [PATCH v3 " Patrick Steinhardt
2021-04-09 11:27 ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-09 11:27 ` [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-09 11:28 ` [PATCH v3 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-09 11:28 ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-11 6:49 ` Junio C Hamano
2021-04-09 11:28 ` [PATCH v3 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-09 11:28 ` [PATCH v3 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-09 11:28 ` [PATCH v3 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-09 11:28 ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-09 11:32 ` [RESEND PATCH " Patrick Steinhardt
2021-04-09 15:00 ` [PATCH " Philip Oakley
2021-04-12 13:15 ` Patrick Steinhardt
2021-04-11 6:02 ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-12 13:12 ` Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-13 9:57 ` Ævar Arnfjörð Bjarmason
2021-04-13 10:43 ` Andreas Schwab
2021-04-14 11:32 ` Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-12 13:37 ` [PATCH v4 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-13 7:45 ` [PATCH v4 0/8] rev-list: implement object type filter Jeff King
2021-04-13 8:06 ` Patrick Steinhardt
2021-04-15 9:42 ` Jeff King
2021-04-16 22:06 ` Junio C Hamano
2021-04-16 23:15 ` Junio C Hamano
2021-04-17 1:17 ` Ramsay Jones
2021-04-17 9:01 ` Jeff King
2021-04-17 21:45 ` Junio C Hamano
2021-04-13 21:03 ` Junio C Hamano
2021-04-14 11:59 ` Patrick Steinhardt
2021-04-14 21:07 ` Junio C Hamano
2021-04-15 9:57 ` Jeff King
2021-04-15 17:53 ` Junio C Hamano
2021-04-15 17:57 ` Junio C Hamano
2021-04-17 8:58 ` Jeff King
2021-04-19 11:46 ` [PATCH v5 " Patrick Steinhardt
2021-04-19 11:46 ` [PATCH v5 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-19 11:46 ` [PATCH v5 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-19 11:46 ` [PATCH v5 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-19 11:46 ` [PATCH v5 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-19 11:46 ` [PATCH v5 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-19 11:46 ` [PATCH v5 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-19 11:47 ` [PATCH v5 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-19 11:47 ` [PATCH v5 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-19 23:16 ` [PATCH v5 0/8] rev-list: implement object type filter Junio C Hamano
2021-04-23 9:13 ` Jeff King
2021-04-28 2:18 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YEk8iiDf/FMxzhIF@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).