git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, "Hariom Verma" <hariom18599@gmail.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>
Subject: Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
Date: Thu, 15 Jul 2021 21:53:04 +0800	[thread overview]
Message-ID: <CAOLTT8Qj_zAhPEzKxJ2dGg8O3R_b8Sn05G29VE5WJhymR4EQSg@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD24X7UjXGKsRWr+f_xmX0x4EVDJHLBs2c1KhECb8-BnBw@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> 于2021年7月15日周四 下午5:45写道:
>
> On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年7月15日周四 上午12:24写道:
> > >
> > > Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:
>
> > > > I find it somewhat alarming if we are talking about "fast-path"
> > > > workaround before understanding why we are seeing slowdown in the
> > > > first place.
> > >
> > > There is no complete conclusion yet, but I try to use time and hyperfine test
> > > for these commits (t/perf/* is not accurate enough):
> > >
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |                        subject                                  |
> > > --batch-check (using hyperfine) |   --batch(using time) |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: use fast path when using default_format         |
> > >         700ms                |          25.450s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: re-implement --textconv, --filters options      |
> > >         790ms                |          29.933s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse err buf in batch_object_write()           |
> > >         770ms                |          29.153s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse ref-filter logic                          |
> > >         780ms                |          29.412s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |The third batch (upstream/master)                                |
> > >         640ms                |          26.025s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > >
> > > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> > > But what causes the loss of performance needs further analysis.
> >
> > Now I think:
> > There are three main reasons why the performance of cat-file --batch
> > deteriorates after refactor.
> >
> > 1. Too many copies are used in ref-filter and we cannot avoid these copies
> > easily because ref-filter needs these copied data to implement atoms %(if),
> > %(else), %(end)... and the --sort option. The original cat-file
> > --batch only needs
> > to output the data to the final string. Its copy times are relatively small.
>
> Is it possible to check early if any of the atoms that needs these
> copied data is specified, and if none of them is specified then to
> avoid the copies?
>

Well, The copy I'm talking about here refers to something like "v->s =
xstrdup(xxx)";
but v->s is need by --sort, so it is very difficult to remove. At the
moment I think the
only solution is the fast path mentioned by Ævar Arnfjörð Bjarmason.

> > 2. More complex data structure and parsing process are used in ref-filter.
> > This is why it can provide more and more useful atoms. Therefore, I think the
> > performance degradation that occurs here is normal.
>
> Are there way the more complex parsing could be avoided if it's not
> needed by the atoms that are actually used?

No. For example, we can only support "objectsize" before and now we can
support "objectsize:short", so we need to pay more parsing process here.
(It's necessary)

>
> > 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
> > twice in get_object() before. oid_object_info_extend() is the hot
> > path, we should
> > try to avoid calling it, So in last version of  "[GSOC] cat-file:
> > re-implement --textconv,
> > --filters options", I make the unified processing of --textconv and
> > --filter avoid calling
> > oid_object_info_extend() twice.
>
> Ok, thanks for the details and your work on this performance issue!
>
> I wonder if your patch series could be split, so that the early parts
> that add new atoms to ref-filter could be merged sooner?
>

Should this part of the work be handed over to Junio?
The implementation of %(rest) and %(raw)  may be worth merging,
they are truly "zh/ref-filter-raw-data".
The other part may be called "cat-file-reuse-ref-filter-logic".

> Best,
> Christian.

Thanks.
--
ZheNing Hu

  reply	other threads:[~2021-07-15 13:52 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 02/19] cat-file: merge two block into one ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 12/19] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 13:17   ` Christian Couder
2021-07-12 13:26     ` Christian Couder
2021-07-12 13:51       ` ZheNing Hu
2021-07-12 13:49     ` ZheNing Hu
2021-07-12 20:38     ` Junio C Hamano
2021-07-14 16:24       ` ZheNing Hu
2021-07-15  1:53         ` ZheNing Hu
2021-07-15  9:45           ` Christian Couder
2021-07-15 13:53             ` ZheNing Hu [this message]
2021-07-15 14:55     ` ZheNing Hu
2021-07-12 11:46 ` [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
2021-07-12 13:01   ` ZheNing Hu
2021-07-12 13:02 ` Philip Oakley
2021-07-12 13:27   ` ZheNing Hu
2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8Qj_zAhPEzKxJ2dGg8O3R_b8Sn05G29VE5WJhymR4EQSg@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).