git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH 0/7] rev-parse: implement object type filter
Date: Wed, 10 Mar 2021 16:39:22 -0500	[thread overview]
Message-ID: <YEk8iiDf/FMxzhIF@coredump.intra.peff.net> (raw)
In-Reply-To: <cover.1614600555.git.ps@pks.im>

On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:

> Altogether, this ends up with the following queries, both of which have
> been executed in a well-packed linux.git repository:
> 
>     # Previous query which uses object names as a heuristic to filter
>     # non-blob objects, which bars us from using bitmap indices because
>     # they cannot print paths.
>     $ time git rev-list --objects --filter=blob:limit=200 \
>         --object-names --all | sed -r '/^.{,41}$/d' | wc -l
>     4502300
> 
>     real 1m23.872s
>     user 1m30.076s
>     sys  0m6.002s
> 
>     # New query.
>     $ time git rev-list --objects --filter-provided \
>         --filter=object:type=blob --filter=blob:limit=200 \
>         --use-bitmap-index --all | wc -l
>     22585
> 
>     real 0m19.216s
>     user 0m16.768s
>     sys  0m2.450s

Those produce very different answers. I guess because in the first one,
you still have a bunch of tree objects, too. You'd do much better to get
the actual types from cat-file, and filter on that. That also lets you
use bitmaps for the traversal portion. E.g.:

  $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
         git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
	 perl -lne 'print $1 if /^blob (.*)/' | wc -l
  14966
  
  real	0m6.248s
  user	0m7.810s
  sys	0m0.440s

which is faster than what you showed above (this is on linux.git, but my
result is different; maybe you have more refs than me?). But we should
be able to do better purely internally, so I suspect my computer is just
faster (or maybe your extra refs just aren't well-covered by bitmaps).
Running with your patches I get:

  $ time git rev-list --objects --use-bitmap-index --all \
             --filter-provided --filter=object:type=blob \
	     --filter=blob:limit=200 | wc -l
  16339

  real	0m1.309s
  user	0m1.234s
  sys	0m0.079s

which is indeed faster. It's quite curious that the answer is not the
same, though! I think yours has some bugs. If I sort and diff the
results, I see some commits mentioned in the output. Perhaps this is
--filter-provided not working, as they all seem to be ref tips.

> To be able to more efficiently answer this query, I've implemented
> multiple things:
> 
> - A new object type filter `--filter=object:type=<type>` for
>   git-rev-list(1), which is implemented both for normal graph walks and
>   for the packfile bitmap index.
> 
> - Given that above usecase requires two filters (the object type
>   and blob size filters), bitmap filters were extended to support
>   combined filters.

That's probably reasonable, especially because it lets us use bitmaps. I
do have a dream that we'll eventually be able to support more extensive
formatting via log/rev-list, which would allow:

  git rev-list --use-bitmap-index --objects --all \
               --format=%(objecttype) %(objectname) |
  perl -ne 'print $1 if /^blob (.*)/'

That should be faster than the separate cat-file (which has to re-lookup
each object, in addition to the extra pipe overhead), but I expect the
--filter solution should always be faster still, as it can very quickly
eliminate the majority of the objects at the bitmap level.

> - git-rev-list(1) doesn't filter user-provided objects and always prints
>   them. I don't want the listed commits though and only their referenced
>   potential LFS blobs. So I've added a new flag `--filter-provided`
>   which marks all provided objects as not-user-provided such that they
>   get filtered the same as all the other objects.

Yeah, this "user-provided" behavior was quite a surprise to me when I
started implementing the bitmap versions of the existing filters. It's
nice to have the option to specify which you want.

-Peff

  parent reply	other threads:[~2021-03-10 21:40 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 2/7] list-objects: move tag processing into its own function Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 3/7] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 4/7] list-objects: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 5/7] pack-bitmap: " Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 6/7] pack-bitmap: implement combined filter Patrick Steinhardt
2021-03-01 12:21 ` [PATCH 7/7] rev-list: allow filtering of provided items Patrick Steinhardt
2021-03-10 21:39 ` Jeff King [this message]
2021-03-11 14:38   ` [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-11 17:54     ` Jeff King
2021-03-15 11:25   ` Patrick Steinhardt
2021-03-10 21:58 ` Taylor Blau
2021-03-10 22:19   ` Jeff King
2021-03-11 14:43     ` Patrick Steinhardt
2021-03-11 17:56       ` Jeff King
2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-06 17:17     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-06 17:30     ` Jeff King
2021-04-09 10:19       ` Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-06 17:39     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-06 17:42     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-06 17:48     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-06 17:54     ` Jeff King
2021-04-09 10:31       ` Patrick Steinhardt
2021-04-09 15:53         ` Jeff King
2021-04-09 11:17       ` Patrick Steinhardt
2021-04-09 15:55         ` Jeff King
2021-03-15 13:15   ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-06 18:04     ` Jeff King
2021-04-09 10:59       ` Patrick Steinhardt
2021-04-09 15:58         ` Jeff King
2021-03-20 21:10   ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-06 18:08     ` Jeff King
2021-04-09 11:14       ` Patrick Steinhardt
2021-04-09 16:05         ` Jeff King
2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
2021-04-09 11:27     ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-09 11:27     ` [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-11  6:49       ` Junio C Hamano
2021-04-09 11:28     ` [PATCH v3 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-09 11:32       ` [RESEND PATCH " Patrick Steinhardt
2021-04-09 15:00       ` [PATCH " Philip Oakley
2021-04-12 13:15         ` Patrick Steinhardt
2021-04-11  6:02     ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-12 13:12       ` Patrick Steinhardt
2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-13  9:57         ` Ævar Arnfjörð Bjarmason
2021-04-13 10:43           ` Andreas Schwab
2021-04-14 11:32           ` Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-13  7:45       ` [PATCH v4 0/8] rev-list: implement object type filter Jeff King
2021-04-13  8:06         ` Patrick Steinhardt
2021-04-15  9:42           ` Jeff King
2021-04-16 22:06             ` Junio C Hamano
2021-04-16 23:15               ` Junio C Hamano
2021-04-17  1:17                 ` Ramsay Jones
2021-04-17  9:01                   ` Jeff King
2021-04-17 21:45                     ` Junio C Hamano
2021-04-13 21:03         ` Junio C Hamano
2021-04-14 11:59           ` Patrick Steinhardt
2021-04-14 21:07             ` Junio C Hamano
2021-04-15  9:57               ` Jeff King
2021-04-15 17:53                 ` Junio C Hamano
2021-04-15 17:57                   ` Junio C Hamano
2021-04-17  8:58                     ` Jeff King
2021-04-19 11:46       ` [PATCH v5 " Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-19 11:47         ` [PATCH v5 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-19 11:47         ` [PATCH v5 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-19 23:16         ` [PATCH v5 0/8] rev-list: implement object type filter Junio C Hamano
2021-04-23  9:13           ` Jeff King
2021-04-28  2:18             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEk8iiDf/FMxzhIF@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).