From: ZheNing Hu <adlternative@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
Git List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Christian Couder <christian.couder@gmail.com>,
Hariom Verma <hariom18599@gmail.com>,
Bagas Sanjaya <bagasdotme@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic
Date: Sun, 4 Jul 2021 19:10:14 +0800 [thread overview]
Message-ID: <CAOLTT8TM8b2o535S5Xvd+=__xPH+oLOTeooXSwQJjj3O1SCVSA@mail.gmail.com> (raw)
In-Reply-To: <87eecf8ork.fsf@evledraar.gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月3日周六 下午10:18写道:
> Most of the problem, although this may not entirely fix the performance
> regression, is that you're either looking up everything twice now, or
> taking a much more expensive path.
>
Yeah, In get_object(), oid_object_info_extended() is called twice when we
want print object's contents. The original intention is to reduce the call of
oid_object_info_extended() once when using --textconv. Should we make
--textconv and --filters have the same logic? In this way, without using
--textconv and --filters, we can call oid_object_info_extended() only once.
> I think using gprof is probably much more handy here. See [1. I did a
> `git rev-list --all >rla` and ran that piped into 'git cat-file --batch'
> with/without your pathces. Results:
>
> $ gprof ./git-master ./gmon-master.out | head -n 10
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 14.29 0.02 0.02 475186 0.00 0.00 nth_packed_object_offset
> 14.29 0.04 0.02 237835 0.00 0.00 hash_to_hex_algop_r
> 7.14 0.05 0.01 5220425 0.00 0.00 hashcmp_algop
> 7.14 0.06 0.01 4757120 0.00 0.00 hex2chr
> 7.14 0.07 0.01 1732023 0.00 0.00 find_entry_ptr
>
> And:
>
> $ gprof ./git-new ./gmon-new.out |head -n 10
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 7.32 0.06 0.06 764570 0.00 0.00 lookup_object
> 7.32 0.12 0.06 237835 0.00 0.00 parse_commit_date
> 4.88 0.16 0.04 712779 0.00 0.00 nth_packed_object_offset
> 3.66 0.19 0.03 964574 0.00 0.00 bsearch_hash
> 3.66 0.22 0.03 237835 0.00 0.00 grab_sub_body_contents
>
It seems that lookup_object() took a lot of time with the patch of my version .
> If you e.g. make lookup_object() simply die when it's called you'll see
> that before we don't call it at all, after your patch it's our #1
> function.
>
> Before when we have the simplest case of writing out an object this is
> our callstack:
>
> (gdb) bt
> #0 batch_write (opt=0x7fffffffde50, data=0x555555ab9470, len=52) at builtin/cat-file.c:298
> #1 0x000055555558b160 in batch_object_write (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0,
> opt=0x7fffffffde50, data=0x7fffffffd7f0) at builtin/cat-file.c:375
> #2 0x000055555558b36e in batch_one_object (obj_name=0x55555597cef0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd8c0, opt=0x7fffffffde50,
> data=0x7fffffffd7f0) at builtin/cat-file.c:431
> #3 0x000055555558b8ed in batch_objects (opt=0x7fffffffde50) at builtin/cat-file.c:588
> #4 0x000055555558c0d3 in cmd_cat_file (argc=0, argv=0x7fffffffe1e0, prefix=0x0) at builtin/cat-file.c:716
> #5 0x0000555555573adb in run_builtin (p=0x555555941870 <commands+240>, argc=2, argv=0x7fffffffe1e0) at git.c:461
> #6 0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1e0) at git.c:714
> #7 0x0000555555574182 in run_argv (argcp=0x7fffffffe08c, argv=0x7fffffffe080) at git.c:781
> #8 0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1e0) at git.c:912
> #9 0x000055555565b508 in main (argc=3, argv=0x7fffffffe1d8) at common-main.c:52
>
> After (well, here we're not even to writing it, just looking it up), the
> BUG() is my addition:
>
> (gdb) bt
> #0 BUG_fl (file=0x5555558ade71 "object.c", line=91, fmt=0x5555558ade6e "yo") at usage.c:290
> #1 0x00005555557441ca in lookup_object (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at object.c:91
> #2 0x000055555569dfc8 in lookup_commit (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>) at commit.c:62
> #3 0x00005555557445f5 in parse_object_buffer (r=0x5555559755c0 <the_repo>, oid=0x555555975160 <oi>, type=OBJ_COMMIT, size=342, buffer=0x555555ab48e0,
> eaten_p=0x7fffffffd36c) at object.c:215
> #4 0x0000555555785094 in get_object (ref=0x7fffffffd6b0, deref=0, obj=0x7fffffffd520, oi=0x555555975160 <oi>, err=0x7fffffffd860) at ref-filter.c:1803
> #5 0x0000555555785c99 in populate_value (ref=0x7fffffffd6b0, err=0x7fffffffd860) at ref-filter.c:2030
> #6 0x0000555555785d7b in get_ref_atom_value (ref=0x7fffffffd6b0, atom=0, v=0x7fffffffd628, err=0x7fffffffd860) at ref-filter.c:2064
> #7 0x000055555578742f in format_ref_array_item (info=0x7fffffffd6b0, format=0x7fffffffde30, final_buf=0x7fffffffd880, error_buf=0x7fffffffd860)
> at ref-filter.c:2659
> #8 0x000055555558ab1c in batch_object_write (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880,
> err=0x7fffffffd860, opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:225
> #9 0x000055555558ade5 in batch_one_object (obj_name=0x55555597e3f0 "504fe6b39f7747be6427f28d9ca97decf5e6cecf", scratch=0x7fffffffd880, err=0x7fffffffd860,
> opt=0x7fffffffde10, data=0x7fffffffd800) at builtin/cat-file.c:298
> #10 0x000055555558b394 in batch_objects (batch=0x7fffffffde10, options=0x7fffffffd900) at builtin/cat-file.c:458
> #11 0x000055555558bbd5 in cmd_cat_file (argc=0, argv=0x7fffffffe1d0, prefix=0x0) at builtin/cat-file.c:585
> #12 0x0000555555573adb in run_builtin (p=0x555555942850 <commands+240>, argc=2, argv=0x7fffffffe1d0) at git.c:461
> #13 0x0000555555573f00 in handle_builtin (argc=2, argv=0x7fffffffe1d0) at git.c:714
> #14 0x0000555555574182 in run_argv (argcp=0x7fffffffe07c, argv=0x7fffffffe070) at git.c:781
> #15 0x000055555557460f in cmd_main (argc=2, argv=0x7fffffffe1d0) at git.c:912
> #16 0x000055555565afc1 in main (argc=3, argv=0x7fffffffe1c8) at common-main.c:52
>
It seems that the call stack is very deep after using my version.
> I.e. before in batch_object_write() we could use a cheap path of doing
> oid_object_info_extended() and directly emitting the content. With your
> version we're all the way down to parse_object_buffer(). Meaning that
> we're going to be creating a "struct commit" or whatever if we're
> looking at a commit, just to print out the raw contents.
>
I agree that the logic in ref-filter is too complicated in order to be
able to print
the structured object data.
> I think the best next step here is to add a t/perf/t1006-cat-file.sh
> test to stress these various cases, i.e. a plain --batch without a
> format, with format, with --batch-all-objects etc. Try to then run that
> on each of your commits against the preceding one and see if/when you
> have regressions.
>
Make sence.
> Aside from any double-lookups etc, the problem is also that you're
> trying to handle a really general case (e.g. with textconv) in a
Well, "--textconv" is a common situation? Here I may not be sure which
scenarios where the upper application calls "git cat-file --batch" are the
most common.
> codepath that needs to be really fast. If anything we should be
> inserting some more more optimization shortcuts for common cases into
> it. E.g. I was able to trivially speed up 'cat-file --batch-check' on
> "master" by hardcoding a path for our default format (patch at the end
> of this mail):
>
> # passed all 2 test(s)
> 1..2
> Test origin/master HEAD
> -------------------------------------------------------------------------
> 1006.2: cat-file --batch-check 0.60(0.37+0.23) 0.35(0.33+0.02) -41.7%
>
> Anything that needs to handle general format patching is going to be
> slower. I think /some/ performance regression if we're using something
> that's not just the current light strbuf_expand() probably can't be
> avoided, but we could/should try to make up the difference at least for
> the common case of --batch or --batch-check without --textconv and
> perhaps hardcode (and document that it's faster) a path for the default
> formats).
>
Yeah, Like the hardcode in your patch may be a solution to the performance
degradation. This will indeed help those upper-level applications that use the
common case.
> 1. https://sourceware.org/binutils/docs/gprof/Output.html
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5ebf13359e8..775b7dd1b01 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -360,6 +360,11 @@ static void batch_object_write(const char *obj_name,
> struct batch_options *opt,
> struct expand_data *data)
> {
> + int default_format = !strcmp(opt->format, "%(objectname) %(objecttype) %(objectsize)");
> + struct strbuf type_name = STRBUF_INIT;
> + if (default_format)
> + data->info.type_name = &type_name;
> +
> if (!data->skip_object_info &&
> oid_object_info_extended(the_repository, &data->oid, &data->info,
> OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> @@ -369,14 +374,20 @@ static void batch_object_write(const char *obj_name,
> return;
> }
>
> - strbuf_reset(scratch);
> - strbuf_expand(scratch, opt->format, expand_format, data);
> - strbuf_addch(scratch, '\n');
> - batch_write(opt, scratch->buf, scratch->len);
> -
> - if (opt->print_contents) {
> - print_object_or_die(opt, data);
> - batch_write(opt, "\n", 1);
> + if (default_format && !opt->print_contents) {
> + fprintf(stdout, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
> + data->info.type_name->buf,
> + (uintmax_t)*data->info.sizep);
> + } else {
> + strbuf_reset(scratch);
> + strbuf_expand(scratch, opt->format, expand_format, data);
> + strbuf_addch(scratch, '\n');
> + batch_write(opt, scratch->buf, scratch->len);
> +
> + if (opt->print_contents) {
> + print_object_or_die(opt, data);
> + batch_write(opt, "\n", 1);
> + }
> }
> }
>
> diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh
> new file mode 100755
> index 00000000000..a295d334715
> --- /dev/null
> +++ b/t/perf/p1006-cat-file.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +
> +test_description='Basic sort performance tests'
> +. ./perf-lib.sh
> +
> +test_perf_default_repo
> +
> +test_expect_success 'setup' '
> + git rev-list --all >rla
> +'
> +
> +test_perf 'cat-file --batch-check' '
> + git cat-file --batch-check <rla
> +'
> +
> +test_done
Thanks.
--
ZheNing Hu
next prev parent reply other threads:[~2021-07-04 11:10 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 16:07 [PATCH 00/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-01 16:07 ` [PATCH 01/15] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 02/15] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-02 13:22 ` Ævar Arnfjörð Bjarmason
2021-07-03 5:14 ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 03/15] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-02 13:29 ` Ævar Arnfjörð Bjarmason
2021-07-03 5:14 ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 04/15] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 05/15] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 06/15] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 07/15] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 08/15] [GSOC] ref-filter: add cat_file_mode in struct ref_format ZheNing Hu via GitGitGadget
2021-07-02 13:32 ` Ævar Arnfjörð Bjarmason
2021-07-02 19:28 ` Eric Sunshine
2021-07-02 22:11 ` Christian Couder
2021-07-03 5:55 ` ZheNing Hu
2021-07-05 7:17 ` Ævar Arnfjörð Bjarmason
2021-07-01 16:08 ` [PATCH 09/15] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 10/15] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-02 13:34 ` Ævar Arnfjörð Bjarmason
2021-07-03 5:50 ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 11/15] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-02 13:36 ` Ævar Arnfjörð Bjarmason
2021-07-02 13:45 ` Christian Couder
2021-07-03 11:45 ` ZheNing Hu
2021-07-03 13:37 ` Ævar Arnfjörð Bjarmason
2021-07-04 11:10 ` ZheNing Hu [this message]
2021-07-05 7:18 ` Ævar Arnfjörð Bjarmason
2021-07-03 14:17 ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 13/15] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 14/15] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-01 19:55 ` Junio C Hamano
2021-07-01 20:11 ` Junio C Hamano
2021-07-02 12:46 ` ZheNing Hu
2021-07-02 15:27 ` Junio C Hamano
2021-07-03 6:17 ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 15/15] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOLTT8TM8b2o535S5Xvd+=__xPH+oLOTeooXSwQJjj3O1SCVSA@mail.gmail.com' \
--to=adlternative@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=hariom18599@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).