git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Hariom verma <hariom18599@gmail.com>,
	Git List <git@vger.kernel.org>
Subject: Re: [GSoC] Git Blog 11
Date: Thu, 5 Aug 2021 12:50:21 +0800	[thread overview]
Message-ID: <CAOLTT8Sd6OCU_Ufrhqstz-Mw0Ej=9F2Y20BjPOpkgsuB5D-4Nw@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD3E9oR9E4S=f8iReKOnvVO_WrXVziyztHZJCiScUAxDRg@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> 于2021年8月4日周三 下午4:57写道:
>
> On Tue, Aug 3, 2021 at 4:48 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年8月3日周二 上午10:37写道:
> > >
> > > Christian Couder <christian.couder@gmail.com> 于2021年8月2日周一 下午2:25写道:
> > > >
> > > > On Sun, Aug 1, 2021 at 8:45 AM ZheNing Hu <adlternative@gmail.com> wrote:
> > > >
> > > > > in some cases, this is the result of the performance test of
> > > > > `t/perf/p1006-cat-file.sh`:
> > > > >
> > > > > ```
> > > > > Test                                        HEAD~             HEAD
> > > > > ------------------------------------------------------------------------------------
> > > > > 1006.2: cat-file --batch-check              0.10(0.09+0.00)
> > > > > 0.11(0.10+0.00) +10.0%
> > > > > 1006.3: cat-file --batch-check with atoms   0.09(0.08+0.01)
> > > > > 0.09(0.06+0.03) +0.0%
> > > > > 1006.4: cat-file --batch                    0.62(0.58+0.04)
> > > > > 0.57(0.54+0.03) -8.1%
> > > > > 1006.5: cat-file --batch with atoms         0.63(0.60+0.02)
> > > > > 0.52(0.49+0.02) -17.5%
> > > > > ```
> > > > >
> > > > > We can see that the performance of `git cat-file --batch` has been a
> > > > > certain improvement!
> > > >
> > > > Yeah, sure -8.1% or -17.5% is really nice! But why +10.0% for
> > > > `cat-file --batch-check`?
> > >
> > > I think it's not very important. Because our optimization is skipping
> > > parse_object_buffer(), git cat-file --batch-check will not set oi->contentp
> > > by default, parse_object_buffer() will not be executed.
>
> Do you think that if git cat-file --batch-check would set
> oi->contentp, there would be no performance regression for `cat-file
> --batch-check`?
> Could you test that?
>

Oh, I mean that if git cat-file --batch-check with its default format
"%(objectname) %(objecttype)
%(objectsize)", it will not have any optimization; But if git cat-file
--batch set with "%(contents)" or
some other atoms, it will indeed be optimized. See 1006.4:

Test                                                 this tree
HEAD~
---------------------------------------------------------------------------------------------
1006.2: cat-file --batch-check                       0.15(0.12+0.02)
0.15(0.13+0.01) +0.0%
1006.3: cat-file --batch-check with basic atoms      0.12(0.10+0.01)
0.12(0.10+0.02) +0.0%
1006.4: cat-file --batch-check with contents atoms   0.66(0.63+0.02)
0.75(0.72+0.02) +13.6%
1006.5: cat-file --batch                             0.61(0.57+0.04)
0.70(0.65+0.05) +14.8%
1006.6: cat-file --batch with atoms                  0.58(0.57+0.01)
0.67(0.63+0.03) +15.5%

> > > Therefore, we did
> > > not optimize `git cat-file --batch-check` at all. 10% may be small enough
> > > for git cat-file --batch-check. The noise of environment even will cover it...
> >
> > By the way, its performance may still be worse than "upstream/master", but it
> > will be better than before optimization.
>
> Nice that there is some improvement, but it would be better if it was
> similar to "upstream/master".
>

Agree.

> > Test                                        HEAD~             this tree
> > ------------------------------------------------------------------------------------
> > 1006.2: cat-file --batch-check              0.10(0.09+0.01)
> > 0.09(0.08+0.01) -10.0%
> > 1006.3: cat-file --batch-check with atoms   0.09(0.07+0.02)
> > 0.08(0.05+0.03) -11.1%
> > 1006.4: cat-file --batch                    0.61(0.59+0.02)
> > 0.53(0.51+0.02) -13.1%
> > 1006.5: cat-file --batch with atoms         0.60(0.57+0.02)
> > 0.52(0.49+0.03) -13.3%
>
> Yeah, your patch seems to be an overall improvement when the
> ref-filter code is used.
>
> > Test                                        upstream/master   this
> > tree
> > ------------------------------------------------------------------------------------
> > 1006.2: cat-file --batch-check              0.08(0.07+0.01)
> > 0.10(0.07+0.02) +25.0%
> > 1006.3: cat-file --batch-check with atoms   0.06(0.05+0.01)
> > 0.08(0.08+0.00) +33.3%
> > 1006.4: cat-file --batch                    0.49(0.46+0.03)
> > 0.53(0.50+0.03) +8.2%
> > 1006.5: cat-file --batch with atoms         0.48(0.45+0.03)
> > 0.51(0.48+0.02) +6.3%
>
> This means that some further performance improvements are still needed
> both for --batch and --batch-check though.
>
> Have you tried to see, using gprof or something else, what is still
> degrading the performance compared to when the ref-filter code isn't
> used?

Yeah, gprof show that Number of calls of strbuf_add(), xstrdup() has increased
after using the logic of ref-filter. But at the same time, I noticed
that grab_person()
seems to be an area worth optimizing. grab_person() uses its parameter
"const char *who"
for type comparison, But after we added `enum atom_type` to
ref-filter, We can use it
for some comparisons. And there are two for() loops in grab_person(),
and we can merge
them into one. With this patch [1], there is a slight improvement in
performance.

Test                                                this tree
HEAD~
-------------------------------------------------------------------------------------------
1006.2: cat-file --batch-check                      0.14(0.13+0.01)
0.15(0.14+0.01) +7.1%
1006.3: cat-file --batch-check with atoms           0.12(0.10+0.01)
0.12(0.09+0.02) +0.0%
1006.4: cat-file --batch-check with contents atom   0.66(0.65+0.01)
0.66(0.64+0.02) +0.0%
1006.5: cat-file --batch                            0.60(0.57+0.02)
0.60(0.57+0.03) +0.0%
1006.6: cat-file --batch with atoms                 0.58(0.53+0.04)
0.58(0.56+0.02) +0.0%
1006.7: cat-file --batch with person atoms          0.59(0.57+0.02)
0.60(0.56+0.04) +1.7%

It’s also worth mentioning that I found that grab_person() seems to be doing
repeated parsing which parse_object_buffer() may already be done.
parse_commit_buffer()
and parse_tag_buffer() have parsed part of the content of the object,
and used by
grab_tag_values() and grab_commit_values(). For the time being, I
think this is a kind of
shallow parsing, if we can let parse_object_buffer() do in-depth
parsing, it would be great.
We can save a lot of work in grab_person()... Of course this may be a
little difficult.

Thanks.
--
ZheNing Hu

[1]: https://github.com/adlternative/git/commit/cec0ee72e64d651c01d7a2a7fe17a4adab1ef0de

      reply	other threads:[~2021-08-05  4:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-01  6:46 [GSoC] Git Blog 11 ZheNing Hu
2021-08-02  6:25 ` Christian Couder
2021-08-03  2:37   ` ZheNing Hu
2021-08-03  2:49     ` ZheNing Hu
2021-08-04  8:56       ` Christian Couder
2021-08-05  4:50         ` ZheNing Hu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8Sd6OCU_Ufrhqstz-Mw0Ej=9F2Y20BjPOpkgsuB5D-4Nw@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).