From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
git <git@vger.kernel.org>, "Hariom Verma" <hariom18599@gmail.com>,
"Bagas Sanjaya" <bagasdotme@gmail.com>,
"Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Eric Sunshine" <sunshine@sunshineco.com>
Subject: Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
Date: Thu, 15 Jul 2021 21:53:04 +0800 [thread overview]
Message-ID: <CAOLTT8Qj_zAhPEzKxJ2dGg8O3R_b8Sn05G29VE5WJhymR4EQSg@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD24X7UjXGKsRWr+f_xmX0x4EVDJHLBs2c1KhECb8-BnBw@mail.gmail.com>
Christian Couder <christian.couder@gmail.com> 于2021年7月15日周四 下午5:45写道:
>
> On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年7月15日周四 上午12:24写道:
> > >
> > > Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:
>
> > > > I find it somewhat alarming if we are talking about "fast-path"
> > > > workaround before understanding why we are seeing slowdown in the
> > > > first place.
> > >
> > > There is no complete conclusion yet, but I try to use time and hyperfine test
> > > for these commits (t/perf/* is not accurate enough):
> > >
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > | subject |
> > > --batch-check (using hyperfine) | --batch(using time) |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: use fast path when using default_format |
> > > 700ms | 25.450s |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: re-implement --textconv, --filters options |
> > > 790ms | 29.933s |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse err buf in batch_object_write() |
> > > 770ms | 29.153s |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse ref-filter logic |
> > > 780ms | 29.412s |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |The third batch (upstream/master) |
> > > 640ms | 26.025s |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > >
> > > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> > > But what causes the loss of performance needs further analysis.
> >
> > Now I think:
> > There are three main reasons why the performance of cat-file --batch
> > deteriorates after refactor.
> >
> > 1. Too many copies are used in ref-filter and we cannot avoid these copies
> > easily because ref-filter needs these copied data to implement atoms %(if),
> > %(else), %(end)... and the --sort option. The original cat-file
> > --batch only needs
> > to output the data to the final string. Its copy times are relatively small.
>
> Is it possible to check early if any of the atoms that needs these
> copied data is specified, and if none of them is specified then to
> avoid the copies?
>
Well, The copy I'm talking about here refers to something like "v->s =
xstrdup(xxx)";
but v->s is need by --sort, so it is very difficult to remove. At the
moment I think the
only solution is the fast path mentioned by Ævar Arnfjörð Bjarmason.
> > 2. More complex data structure and parsing process are used in ref-filter.
> > This is why it can provide more and more useful atoms. Therefore, I think the
> > performance degradation that occurs here is normal.
>
> Are there way the more complex parsing could be avoided if it's not
> needed by the atoms that are actually used?
No. For example, we can only support "objectsize" before and now we can
support "objectsize:short", so we need to pay more parsing process here.
(It's necessary)
>
> > 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
> > twice in get_object() before. oid_object_info_extend() is the hot
> > path, we should
> > try to avoid calling it, So in last version of "[GSOC] cat-file:
> > re-implement --textconv,
> > --filters options", I make the unified processing of --textconv and
> > --filter avoid calling
> > oid_object_info_extend() twice.
>
> Ok, thanks for the details and your work on this performance issue!
>
> I wonder if your patch series could be split, so that the early parts
> that add new atoms to ref-filter could be merged sooner?
>
Should this part of the work be handed over to Junio?
The implementation of %(rest) and %(raw) may be worth merging,
they are truly "zh/ref-filter-raw-data".
The other part may be called "cat-file-reuse-ref-filter-logic".
> Best,
> Christian.
Thanks.
--
ZheNing Hu
next prev parent reply other threads:[~2021-07-15 13:52 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 02/19] cat-file: merge two block into one ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 12/19] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 13:17 ` Christian Couder
2021-07-12 13:26 ` Christian Couder
2021-07-12 13:51 ` ZheNing Hu
2021-07-12 13:49 ` ZheNing Hu
2021-07-12 20:38 ` Junio C Hamano
2021-07-14 16:24 ` ZheNing Hu
2021-07-15 1:53 ` ZheNing Hu
2021-07-15 9:45 ` Christian Couder
2021-07-15 13:53 ` ZheNing Hu [this message]
2021-07-15 14:55 ` ZheNing Hu
2021-07-12 11:46 ` [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
2021-07-12 13:01 ` ZheNing Hu
2021-07-12 13:02 ` Philip Oakley
2021-07-12 13:27 ` ZheNing Hu
2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-15 15:40 ` [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOLTT8Qj_zAhPEzKxJ2dGg8O3R_b8Sn05G29VE5WJhymR4EQSg@mail.gmail.com \
--to=adlternative@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=hariom18599@gmail.com \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).