git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom Verma <hariom18599@gmail.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic
Date: Mon, 05 Jul 2021 09:18:43 +0200	[thread overview]
Message-ID: <87im1p6x34.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CAOLTT8TM8b2o535S5Xvd+=__xPH+oLOTeooXSwQJjj3O1SCVSA@mail.gmail.com>


On Sun, Jul 04 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月3日周六 下午10:18写道:
>
>> Most of the problem, although this may not entirely fix the performance
>> regression, is that you're either looking up everything twice now, or
>> taking a much more expensive path.
>>
>
> Yeah, In get_object(), oid_object_info_extended() is called twice when we
> want print object's contents. The original intention is to reduce the call of
> oid_object_info_extended() once when using --textconv. Should we make
> --textconv and --filters have the same logic? In this way, without using
> --textconv and --filters, we can call oid_object_info_extended() only once.
>
>> I think using gprof is probably much more handy here. See [1. I did a
>> `git rev-list --all >rla` and ran that piped into 'git cat-file --batch'
>> with/without your pathces. Results:
>>
>>     $ gprof ./git-master ./gmon-master.out | head -n 10
>>     Flat profile:
>>
>>     Each sample counts as 0.01 seconds.
>>       %   cumulative   self              self     total
>>      time   seconds   seconds    calls  ms/call  ms/call  name
>>      14.29      0.02     0.02   475186     0.00     0.00  nth_packed_object_offset
>>      14.29      0.04     0.02   237835     0.00     0.00  hash_to_hex_algop_r
>>       7.14      0.05     0.01  5220425     0.00     0.00  hashcmp_algop
>>       7.14      0.06     0.01  4757120     0.00     0.00  hex2chr
>>       7.14      0.07     0.01  1732023     0.00     0.00  find_entry_ptr
>>
>> And:
>>
>>     $ gprof ./git-new ./gmon-new.out |head -n 10
>>     Flat profile:
>>
>>     Each sample counts as 0.01 seconds.
>>       %   cumulative   self              self     total
>>      time   seconds   seconds    calls  ms/call  ms/call  name
>>       7.32      0.06     0.06   764570     0.00     0.00  lookup_object
>>       7.32      0.12     0.06   237835     0.00     0.00  parse_commit_date
>>       4.88      0.16     0.04   712779     0.00     0.00  nth_packed_object_offset
>>       3.66      0.19     0.03   964574     0.00     0.00  bsearch_hash
>>       3.66      0.22     0.03   237835     0.00     0.00  grab_sub_body_contents
>>
>
> It seems that lookup_object() took a lot of time with the patch of my version .
>
>> If you e.g. make lookup_object() simply die when it's called you'll see
>> that before we don't call it at all, after your patch it's our #1
>> function.
>>
>> Before when we have the simplest case of writing out an object this is
>> our callstack:
>>
>>     (gdb) bt
>>     #0  batch_write (opt=0x7fffffffde50, data=0x555555ab9470, len=52) at builtin/cat-file.c:298
>>     #1  0x000055555558b160 in batch_object_write (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0,
>>         opt=0x7fffffffde50, data=0x7fffffffd7f0) at builtin/cat-file.c:375
>>     #2  0x000055555558b36e in batch_one_object (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0, opt=0x7fffffffde50,
>>         data=0x7fffffffd7f0) at builtin/cat-file.c:431
>>     #3  0x000055555558b8ed in batch_objects (opt=0x7fffffffde50) at builtin/cat-file.c:588
>>     #4  0x000055555558c0d3 in cmd_cat_file (argc=0, argv=0x7fffffffe1e0, prefix=0x0) at builtin/cat-file.c:716
>>     #5  0x0000555555573adb in run_builtin (p=0x555555941870 <commands+240>, argc=2, argv=0x7fffffffe1e0) at git.c:461
>>     #6  0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1e0) at git.c:714
>>     #7  0x0000555555574182 in run_argv (argcp=0x7fffffffe08c, argv=0x7fffffffe080) at git.c:781
>>     #8  0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1e0) at git.c:912
>>     #9  0x000055555565b508 in main (argc=3, argv=0x7fffffffe1d8) at common-main.c:52
>>
>> After (well, here we're not even to writing it, just looking it up), the
>> BUG() is my addition:
>>
>>     (gdb) bt
>>     #0  BUG_fl (file=0x5555558ade71 "object.c", line=91, fmt=0x5555558ade6e "yo") at usage.c:290
>>     #1  0x00005555557441ca in lookup_object (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at object.c:91
>>     #2  0x000055555569dfc8 in lookup_commit (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at commit.c:62
>>     #3  0x00005555557445f5 in parse_object_buffer (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>, type=OBJ_COMMIT, size=342, buffer=0x555555ab48e0,
>>         eaten_p=0x7fffffffd36c) at object.c:215
>>     #4  0x0000555555785094 in get_object (ref=0x7fffffffd6b0, deref=0, obj=0x7fffffffd520, oi=0x555555975160 <oi>, err=0x7fffffffd860) at ref-filter.c:1803
>>     #5  0x0000555555785c99 in populate_value (ref=0x7fffffffd6b0, err=0x7fffffffd860) at ref-filter.c:2030
>>     #6  0x0000555555785d7b in get_ref_atom_value (ref=0x7fffffffd6b0, atom=0, v=0x7fffffffd628, err=0x7fffffffd860) at ref-filter.c:2064
>>     #7  0x000055555578742f in format_ref_array_item (info=0x7fffffffd6b0, format=0x7fffffffde30, final_buf=0x7fffffffd880, error_buf=0x7fffffffd860)
>>         at ref-filter.c:2659
>>     #8  0x000055555558ab1c in batch_object_write (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880,
>>         err=0x7fffffffd860, opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:225
>>     #9  0x000055555558ade5 in batch_one_object (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880, err=0x7fffffffd860,
>>         opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:298
>>     #10 0x000055555558b394 in batch_objects (batch=0x7fffffffde10, options=0x7fffffffd900) at builtin/cat-file.c:458
>>     #11 0x000055555558bbd5 in cmd_cat_file (argc=0, argv=0x7fffffffe1d0, prefix=0x0) at builtin/cat-file.c:585
>>     #12 0x0000555555573adb in run_builtin (p=0x555555942850 <commands+240>, argc=2, argv=0x7fffffffe1d0) at git.c:461
>>     #13 0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1d0) at git.c:714
>>     #14 0x0000555555574182 in run_argv (argcp=0x7fffffffe07c, argv=0x7fffffffe070) at git.c:781
>>     #15 0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1d0) at git.c:912
>>     #16 0x000055555565afc1 in main (argc=3, argv=0x7fffffffe1c8) at common-main.c:52
>>
>
> It seems that the call stack is very deep after using my version.
>
>> I.e. before in batch_object_write() we could use a cheap path of doing
>> oid_object_info_extended() and directly emitting the content. With your
>> version we're all the way down to parse_object_buffer(). Meaning that
>> we're going to be creating a "struct commit" or whatever if we're
>> looking at a commit, just to print out the raw contents.
>>
>
> I agree that the logic in ref-filter is too complicated in order to be
> able to print
> the structured object data.

Well, maybe it isn't. I think the main problem with your performance is
just looking things up again from the object store needlessly, and
possibly redundantly copying things around. E.g. if you have a buffer
with a commit object and you need to xstrdup() that, and xstrdup() the
answer etc, it adds up.

Maybe if all that's gone the formatting will still slow things down, but
I bet it's going to be to a much smaller extent.

>> I think the best next step here is to add a t/perf/t1006-cat-file.sh
>> test to stress these various cases, i.e. a plain --batch without a
>> format, with format, with --batch-all-objects etc. Try to then run that
>> on each of your commits against the preceding one and see if/when you
>> have regressions.
>>
>
> Make sence.
>
>> Aside from any double-lookups etc, the problem is also that you're
>> trying to handle a really general case (e.g. with textconv) in a
>
> Well, "--textconv" is a common situation? Here I may not be sure which
> scenarios where the upper application calls "git cat-file --batch" are the
> most common.

I think --textconv is much rarer. I think it's used for API purposes
e.g. for people who are doing \n translation, or maybe for the likes of
git-lfs (or was that --filters)? I'm not very familiar with it.

For what it's worth 'git cat-file --batch' is very performance sensitive
in backend use-cases like e.g. gitlab.com's gitaly (which executes all
git commands for you on that site). Anything that looks up .. any object
pretty much .. is prining that object line to a persistent "cat-file
--batch" process.

There was a case where some very small (unrelated to cat-file per-se)
slowdown in that codepath ended up slowing down overall API requests for
some calls by >100%, because it's executed in a tight loop. E.g. if
you're showing N commits per page that might be N cat-file --batch
calls.

  reply	other threads:[~2021-07-05  7:25 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01 16:07 [PATCH 00/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-01 16:07 ` [PATCH 01/15] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 02/15] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-02 13:22   ` Ævar Arnfjörð Bjarmason
2021-07-03  5:14     ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 03/15] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-02 13:29   ` Ævar Arnfjörð Bjarmason
2021-07-03  5:14     ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 04/15] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 05/15] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 06/15] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 07/15] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 08/15] [GSOC] ref-filter: add cat_file_mode in struct ref_format ZheNing Hu via GitGitGadget
2021-07-02 13:32   ` Ævar Arnfjörð Bjarmason
2021-07-02 19:28     ` Eric Sunshine
2021-07-02 22:11       ` Christian Couder
2021-07-03  5:55         ` ZheNing Hu
2021-07-05  7:17           ` Ævar Arnfjörð Bjarmason
2021-07-01 16:08 ` [PATCH 09/15] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 10/15] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-02 13:34   ` Ævar Arnfjörð Bjarmason
2021-07-03  5:50     ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 11/15] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-02 13:36   ` Ævar Arnfjörð Bjarmason
2021-07-02 13:45     ` Christian Couder
2021-07-03 11:45     ` ZheNing Hu
2021-07-03 13:37       ` Ævar Arnfjörð Bjarmason
2021-07-04 11:10         ` ZheNing Hu
2021-07-05  7:18           ` Ævar Arnfjörð Bjarmason [this message]
2021-07-03 14:17       ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 13/15] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 14/15] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-01 19:55   ` Junio C Hamano
2021-07-01 20:11   ` Junio C Hamano
2021-07-02 12:46     ` ZheNing Hu
2021-07-02 15:27       ` Junio C Hamano
2021-07-03  6:17         ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 15/15] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87im1p6x34.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).