git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, "Hariom Verma" <hariom18599@gmail.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>
Subject: Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
Date: Thu, 15 Jul 2021 11:45:42 +0200	[thread overview]
Message-ID: <CAP8UFD24X7UjXGKsRWr+f_xmX0x4EVDJHLBs2c1KhECb8-BnBw@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8Sa984Eo18QMBeGnMCX3_7sr+9qUYoAR4FS3UF6+CDtGw@mail.gmail.com>

On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> ZheNing Hu <adlternative@gmail.com> 于2021年7月15日周四 上午12:24写道:
> >
> > Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:

> > > I find it somewhat alarming if we are talking about "fast-path"
> > > workaround before understanding why we are seeing slowdown in the
> > > first place.
> >
> > There is no complete conclusion yet, but I try to use time and hyperfine test
> > for these commits (t/perf/* is not accurate enough):
> >
> > ----------------------------------------------------------------------------------------------------------------------------
> > |                        subject                                  |
> > --batch-check (using hyperfine) |   --batch(using time) |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: use fast path when using default_format         |
> >         700ms                |          25.450s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: re-implement --textconv, --filters options      |
> >         790ms                |          29.933s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: reuse err buf in batch_object_write()           |
> >         770ms                |          29.153s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: reuse ref-filter logic                          |
> >         780ms                |          29.412s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |The third batch (upstream/master)                                |
> >         640ms                |          26.025s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> >
> > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> > But what causes the loss of performance needs further analysis.
>
> Now I think:
> There are three main reasons why the performance of cat-file --batch
> deteriorates after refactor.
>
> 1. Too many copies are used in ref-filter and we cannot avoid these copies
> easily because ref-filter needs these copied data to implement atoms %(if),
> %(else), %(end)... and the --sort option. The original cat-file
> --batch only needs
> to output the data to the final string. Its copy times are relatively small.

Is it possible to check early if any of the atoms that needs these
copied data is specified, and if none of them is specified then to
avoid the copies?

> 2. More complex data structure and parsing process are used in ref-filter.
> This is why it can provide more and more useful atoms. Therefore, I think the
> performance degradation that occurs here is normal.

Are there way the more complex parsing could be avoided if it's not
needed by the atoms that are actually used?

> 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
> twice in get_object() before. oid_object_info_extend() is the hot
> path, we should
> try to avoid calling it, So in last version of  "[GSOC] cat-file:
> re-implement --textconv,
> --filters options", I make the unified processing of --textconv and
> --filter avoid calling
> oid_object_info_extend() twice.

Ok, thanks for the details and your work on this performance issue!

I wonder if your patch series could be split, so that the early parts
that add new atoms to ref-filter could be merged sooner?

Best,
Christian.

  reply	other threads:[~2021-07-15  9:45 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 02/19] cat-file: merge two block into one ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 12/19] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 13:17   ` Christian Couder
2021-07-12 13:26     ` Christian Couder
2021-07-12 13:51       ` ZheNing Hu
2021-07-12 13:49     ` ZheNing Hu
2021-07-12 20:38     ` Junio C Hamano
2021-07-14 16:24       ` ZheNing Hu
2021-07-15  1:53         ` ZheNing Hu
2021-07-15  9:45           ` Christian Couder [this message]
2021-07-15 13:53             ` ZheNing Hu
2021-07-15 14:55     ` ZheNing Hu
2021-07-12 11:46 ` [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
2021-07-12 13:01   ` ZheNing Hu
2021-07-12 13:02 ` Philip Oakley
2021-07-12 13:27   ` ZheNing Hu
2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD24X7UjXGKsRWr+f_xmX0x4EVDJHLBs2c1KhECb8-BnBw@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).