git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Abhishek Kumar <abhishekkumar8222@gmail.com>,
	jnareb@gmail.com, git@vger.kernel.org
Subject: Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
Date: Mon, 8 Jun 2020 18:46:37 +0200	[thread overview]
Message-ID: <20200608164637.GE8232@szeder.dev> (raw)
In-Reply-To: <13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com>

On Mon, Jun 08, 2020 at 09:45:12AM -0400, Derrick Stolee wrote:
> On 6/8/2020 4:36 AM, SZEDER Gábor wrote:
> > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
> >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> >>>>> The struct commit is used in many contexts. However, members generation
> >>>>> and graph_pos are only used for commit-graph related operations and
> >>>>> otherwise waste memory.
> >>>>>
> >>>>> This wastage would have been more pronounced as transistion to
> >>>>> generation number v2, which uses 64-bit generation number instead of
> >>>>> current 32-bits.
> >>>>
> >>>> Thanks! This is an important step, and will already improve
> >>>> performance in subtle ways.
> >>>
> >>> While the reduced memory footprint of each commit object might improve
> >>> performance, accessing graph position and generation numbers in a
> >>> commit-slab is more expensive than direct field accesses in 'struct
> >>> commit' instances.  Consequently, these patches increase the runtime
> >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> >>> repository from 0.630s to 0.940s.
> >>>
> >>
> >> Thank you for checking performance. Performance penalty was something we
> >> had discussed here [1]. 
> >>
> >> Caching the commit slab results in local variables helped wonderfully in v2 [2].
> >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
> >> in the linux repository increased from 0.762 to 0.767s. Since this is a
> >> change of <1%, it is *no longer* a performance regression in my opinion.
> > 
> > Interesting, I measured 0.870s with v2, still a notable increase from
> > 0.630s.
> 
> This is an interesting point. The --is-ancestor is critical to the
> performance issue (as measured on my machine).
> 
> For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get
> 
> v2.27.0:
> real    0m0.515s
> user    0m0.467s
> sys     0m0.048s
> 
> v2 series:
> real    0m0.534s
> user    0m0.481s
> sys     0m0.053s

I, too, see similarly small differences in this case.

> With "--is-ancestor" I see the following:
> 
> v2.27.0:
> real    0m0.591s
> user    0m0.539s
> sys     0m0.052s
> 
> v2 series:
> real    0m0.773s
> user    0m0.733s
> sys     0m0.040s
> 
> The --is-ancestor option [1] says
> 
>     Check if the first <commit> is an ancestor of the second
>     <commit>, and exit with status 0 if true, or with status
>     1 if not. Errors are signaled by a non-zero status that
>     is not 1.
> 
> [1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor
> 
> This _should_ be faster than "git branch --contains HEAD~50000",
> but it is much much slower:
> 
> $ time git branch --contains HEAD~50000
> real    0m0.068s
> user    0m0.061s
> sys     0m0.008s
> 
> So, there is definitely something going on that slows the
> "--is-ancestor" path in this case. But, the solution is not
> to halt the current patch (which likely has memory footprint
> benefits when dealing with a lot of tree and blob objects)
> and instead fix the underlying algorithm.

Other, more common cases are affected as well, notably the simple 'git
rev-list --topo-order':

  performance: 1.226479734 s: git command: /home/szeder/src/git/BUILDS/v2.27.0/bin/git rev-list --topo-order HEAD
  max RSS: 162400k
  
  performance: 1.741309536 s: git command: /home/szeder/src/git/git rev-list --topo-order HEAD
  max RSS: 169556k

Is the supposed memory footprint reduction that large to justify this
runtime increase?

> Let's add that to the list of things to do.

And to the commit messages.

> >>>  create mode 100644 contrib/coccinelle/generation.cocci
> >>>  create mode 100644 contrib/coccinelle/graph_pos.cocci
> >>
> >> I appreciate the Coccinelle scripts to help identify
> >> automatic fixes for other topics in-flight. However,
> >> I wonder if they would be better placed inside the
> >> existing commit.cocci file?
> >
> > We add Coccinelle scripts to avoid undesirable code patterns entering
> > our code base.  That, however, is not the case here: this is a
> > one-time conversion, and at the end of this series 'struct commit'
> > won't have a 'generation' field anymore, so once it's merged the
> > compiler will catch any new 'commit->generation' accesses.  Therefore
> > I don't think that these Coccinelle scripts should be added at all.
> 
> I disagree. We _also_ add Coccinelle scripts when doing one-time
> refactors to avoid logical merge conflicts with other topics in
> flight. If someone else is working on a parallel topic that adds
> references to graph_pos or generation member, then the scripts provide
> an easy way for the maintainer to update those references in the merge
> commit. Alternatively, the contributor could rebase on top of this
> series and run the scripts themselves to fix their patches before
> submission.
> 
> For example, this was done carefully in the sha->object_id
> conversion using contrib/coccinelle/object_id.cocci.

'object_id.cocci' is not about sha->object_id conversions, but about
avoiding undesirable code patterns, e.g. we prefer oideq() over
!oidcmp(), and the compiler, of course, can't help to catch that.
Coccinelle scripts used for actual sha->object_id transformations were
not added to 'object_id.cocci', but were recorded only in the commit
messages for reference, see e.g.  9b56149996 (merge-recursive: convert
struct merge_file_info to object_id, 2016-06-24) and a couple of its
ancestors.


  reply	other threads:[~2020-06-08 16:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
2020-06-04 14:36   ` Derrick Stolee
2020-06-04 17:35   ` Junio C Hamano
2020-06-05 23:23   ` Jakub Narębski
2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
2020-06-04 14:27   ` Derrick Stolee
2020-06-04 17:49   ` Junio C Hamano
2020-06-06 22:03   ` Jakub Narębski
2020-06-04  7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar
2020-06-07 12:12   ` Jakub Narębski
2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
2020-06-04 17:55   ` Junio C Hamano
2020-06-07 19:53   ` SZEDER Gábor
2020-06-08  5:48     ` Abhishek Kumar
2020-06-08  8:36       ` SZEDER Gábor
2020-06-08 13:45         ` Derrick Stolee
2020-06-08 16:46           ` SZEDER Gábor [this message]
2020-06-08 15:21         ` Jakub Narębski
2020-06-05 19:00 ` Jakub Narębski
2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-15 16:27     ` Taylor Blau
2020-06-07 19:32   ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-08  8:26     ` SZEDER Gábor
2020-06-08 12:35       ` Derrick Stolee
2020-06-07 19:32   ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar
2020-06-08 16:31     ` Jakub Narębski
2020-06-15 16:31       ` Taylor Blau
2020-06-07 19:32   ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-08 16:22   ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski
2020-06-15 16:24   ` Taylor Blau
2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-19 13:59   ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee
2020-06-19 17:44     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200608164637.GE8232@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).