git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Teng Long <dyroneteng@gmail.com>
Cc: git@vger.kernel.org, tenglong.tl@alibaba-inc.com, me@ttaylorr.com
Subject: Re: [RFC PATCH 0/6] ls-tree: introduce '--pattern' option
Date: Thu, 17 Nov 2022 14:22:20 +0100	[thread overview]
Message-ID: <221117.86k03tiudl.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20221117113023.65865-1-tenglong.tl@alibaba-inc.com>


On Thu, Nov 17 2022, Teng Long wrote:

> From: Teng Long <dyroneteng@gmail.com>
>
> This RFC patch introduce a new "ls-tree" option "--pattern", aim to match the
> entries by regex then filter the output which we may want to achieve. It also
> contains some commit for preparation or cleanup.
>
> The idea may be not comprehensive and the tests for it might be insufficient
> too, but I'd like to listen the suggestion from the community to decide if it's
> worth going forward with.

I applied this series, compiled with CFLAGS=-O3, and:
	
	$ hyperfine './git ls-tree --pattern=[ab]c.*d -r HEAD' './git ls-tree -r HEAD | grep [ab]c.*d' -w 10 -r 20
	Benchmark 1: ./git ls-tree --pattern=[ab]c.*d -r HEAD
	  Time (mean ± σ):      14.8 ms ±   0.1 ms    [User: 12.0 ms, System: 2.8 ms]
	  Range (min … max):    14.6 ms …  15.0 ms    20 runs
	
	Benchmark 2: ./git ls-tree -r HEAD | grep [ab]c.*d
	  Time (mean ± σ):      12.5 ms ±   0.1 ms    [User: 10.0 ms, System: 4.0 ms]
	  Range (min … max):    12.4 ms …  12.8 ms    20 runs
	
	Summary
	  './git ls-tree -r HEAD | grep [ab]c.*d' ran
	    1.18 ± 0.01 times faster than './git ls-tree --pattern=[ab]c.*d -r HEAD'

So the value-proposition isn't really clear to me, and the included
docs, commit messages & this CL don't answer the "why not just 'grep'"
question?

That's faster even with another process for me, but likely that's
because you're doing the regex matching really inefficiently
(e.g. malloc-ing again for each line), which could be "fixed".

But in any setup which cares about the performance you're likely piping
to another process anyway (the thing using the data), which could do
that filtering without thep "grep" process.

So I don't see the value in doing this, but maybe I'm just missing
something.

And, in terms of the complexity for git's implementation it would be
really good to avoid the complexity of a "--pattern", "--sort-lines"
etc., if those use-cases can be satisfied by piping into "grep" or
"sort" instead.

Some of the pre-cleanup here looks good, but it's unrelated to the rest
of the series. I think in any case that it would be nice to see that as
another topic.

  parent reply	other threads:[~2022-11-17 13:29 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17 11:30 [RFC PATCH 0/6] ls-tree: introduce '--pattern' option Teng Long
2022-11-17 11:30 ` [RFC PATCH 1/6] ls-tree: cleanup the redundant SPACE Teng Long
2022-11-17 11:30 ` [RFC PATCH 2/6] t3104: remove shift code in 'test_ls_tree_format' Teng Long
2022-11-17 11:30 ` [RFC PATCH 3/6] ls-tree: optimize params of 'show_tree_common_default_long()' Teng Long
2022-11-17 11:30 ` [RFC PATCH 4/6] ls-tree: improving cohension in the print code Teng Long
2022-11-17 13:53   ` Ævar Arnfjörð Bjarmason
2022-11-17 11:30 ` [RFC PATCH 5/6] ls-tree: introduce 'match_pattern()' function Teng Long
2022-11-17 14:02   ` Ævar Arnfjörð Bjarmason
2022-11-30  9:39   ` Ævar Arnfjörð Bjarmason
2022-11-17 11:30 ` [RFC PATCH 6/6] ls-tree: introduce '--pattern' option Teng Long
2022-11-17 14:03   ` Ævar Arnfjörð Bjarmason
2022-12-12  8:32   ` Johannes Schindelin
2022-12-12 23:57     ` Junio C Hamano
2022-12-14  5:27       ` Junio C Hamano
2022-12-14 10:03         ` Ævar Arnfjörð Bjarmason
2022-12-14 10:38           ` Junio C Hamano
2023-03-27 10:37       ` win-test: unknown terminal "xterm-256color", was " Johannes Schindelin
2023-03-27 20:42         ` Junio C Hamano
2023-03-28 18:08           ` Jeff King
2023-03-28 19:31             ` Junio C Hamano
2023-03-28 19:59               ` Jeff King
2023-03-28 20:43                 ` Jeff King
2023-03-28 21:05                   ` Junio C Hamano
2022-11-17 13:22 ` Ævar Arnfjörð Bjarmason [this message]
2022-11-17 22:02   ` [RFC PATCH 0/6] " Taylor Blau
2022-11-21 11:41     ` Teng Long
2022-11-21 12:12       ` Ævar Arnfjörð Bjarmason
2022-11-17 13:48 ` [RFC PATCH 0/4] ls-tree: pass state in struct, not globals Ævar Arnfjörð Bjarmason
2022-11-17 13:48   ` [RFC PATCH 1/4] ls-tree: don't use "show_tree_data" for "fast" callbacks Ævar Arnfjörð Bjarmason
2022-12-21 11:47     ` Teng Long
2022-11-17 13:48   ` [RFC PATCH 2/4] ls-tree: use a "struct options" Ævar Arnfjörð Bjarmason
2022-11-17 13:48   ` [RFC PATCH 3/4] ls-tree: fold "show_tree_data" into "cb" struct Ævar Arnfjörð Bjarmason
2022-11-17 13:48   ` [RFC PATCH 4/4] ls-tree: make "line_termination" less generic Ævar Arnfjörð Bjarmason
2022-11-21 12:00   ` [RFC PATCH 0/4] ls-tree: pass state in struct, not globals Teng Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221117.86k03tiudl.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=tenglong.tl@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).