git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Hariom verma" <hariom18599@gmail.com>
Subject: Re: [GSOC] How to improve the performance of git cat-file --batch
Date: Wed, 28 Jul 2021 09:34:39 +0200	[thread overview]
Message-ID: <CAP8UFD1WtSX59AqfG=d0Ge2BcK+8LdyZk0mQuftpu=FKX-877Q@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8TdL7UhfVSOzbpmo-WFNrcKwmy=E720tNt4KM9o_p=keg@mail.gmail.com>

On Tue, Jul 27, 2021 at 3:37 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> 于2021年7月26日周一 下午5:38写道:
> >
> > On Sun, Jul 25, 2021 at 2:04 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月25日周日 上午5:23写道:
> >
> > > > Having skimmed it I'm a bit confused about this in reference to
> > > > performance generally. I haven't looked into the case you're discussing,
> > > > but as I noted in
> > > > https://lore.kernel.org/git/87im1p6x34.fsf@evledraar.gmail.com/ the
> > > > profiling clearly shows that the main problem is that you've added
> > > > object lookups we skipped before.
> > >
> > > Yeah, you showed me last time that lookup_object() took up a lot of time.
> >
> > Could the document explain with some details why there are more calls
> > to lookup_object()?

Please note that here we are looking for the number of times the
lookup_object() function is called. This means that to measure that
properly, it might actually be better to have some way to count this
number of times the lookup_object() function is called, rather than
count the time spent in the function.

For example you could add a trace_printf(...) call in the
lookup_object() function, set GIT_TRACE=/tmp/git_trace.log, and then
just run `git cat-file --batch ...` and count the number of times the
new trace from lookup_object() appears in the log file.

> > For example it could take an example `git cat-file
> > --batch ...` command (if possible a simple one), and say which
> > functions like lookup_object() it was using (and how many times) to
> > get the data it needs before using the ref-filter logic, and then the
> > same information after using the ref-filter logic.
>
> Sorry but this time I use gprof but can’t observe the same effect as before.
> lookup_object() is indeed a part of the time overhead, but its proportion is
> not very large this time.

I am not sure gprof is a good tool for this. It looks like it tries to
attribute spent times to functions by splitting time between many low
level functions, and it doesn't seem like the right approach to me.
For example if lookup_object() is called 5% more often, it could be
that the excess time is attributed to some low level functions and not
to lookup_object() itself.

That's why we might get a more accurate view of what happens by just
counting the number of time the function is called.

> > It could be nice if there were also some data about how much time used
> > to be spent in lookup_object() and how much time is now spent there,
> > and how this compares with the whole slowdown we are seeing. If Ævar
> > already showed that, you can of course reuse what he already did.

Now I regret having wrote the above, sorry, as it might not be the
best way to look at this.

> This is my test for git cat-file --batch --batch-all-objects >/dev/null:

[...]

> Because we called parse_object_buffer() in get_object(), lookup_object()
> is called indirectly...

It would be nice if you could add a bit more details about how
lookup_object() is called (both before and after the changes that
degrade performance).

> We can see that some functions are called the same times:

When you say "the same times" I guess you mean that the same amount of
time is spent in these functions.

> patch_delta(),
> unpack_entry(), hashmap_remove()... But after using my patch,
> format_ref_array_item(), grab_sub_body_contents(), get_object(), lookup_object()
> begin to occupy a certain proportion.

Thanks!

  reply	other threads:[~2021-07-28  7:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-24 14:22 [GSOC] How to improve the performance of git cat-file --batch ZheNing Hu
2021-07-24 21:20 ` Ævar Arnfjörð Bjarmason
2021-07-25 12:05   ` ZheNing Hu
2021-07-26  9:38     ` Christian Couder
2021-07-27  1:37       ` ZheNing Hu
2021-07-28  7:34         ` Christian Couder [this message]
2021-07-28 13:38           ` ZheNing Hu
2021-07-28 15:36             ` Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD1WtSX59AqfG=d0Ge2BcK+8LdyZk0mQuftpu=FKX-877Q@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).