git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
Date: Sun, 22 Sep 2019 21:05:11 +0200	[thread overview]
Message-ID: <20190922190511.GB10866@szeder.dev> (raw)
In-Reply-To: <a507bbd1-88cf-6668-908e-92978fb77930@web.de>

On Sat, Sep 21, 2019 at 02:37:18PM +0200, René Scharfe wrote:
> Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
> > On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
> >> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
> >>>> If the (re)introduced leak doesn't impact performance and memory
> >>>> usage too much then duplicating tip_name again in name_rev() or
> >>>> rather your new create_or_update_name() would likely make the
> >>>> lifetimes of those string buffers easier to manage.
> >>>
> >>> Yeah, the easiest would be when each 'struct rev_name' in the commit
> >>> slab would have its own 'tip_name' string, but that would result in
> >>> a lot of duplicated strings and increased memory usage.
> >>
> >> I didn't measure how much more memory would be used, though.
> >
> > So, I tried the patch below to give each 'struct rev_name' instance
> > its own copy of 'tip_name', and the memory usage of 'git name-rev
> > --all' usually increased.
> >
> > The increase depends on how many merges and how many refs there are
> > compared to the number of commits: the fewer merges and refs, the
> > higher the more the memory usage increased:
> >
> >   linux:         +4.8%
> >   gcc:           +7.2%
> >   gecko-dev:     +9.2%
> >   webkit:       +12.4%
> >   llvm-project: +19.0%
> 
> Is that the overall memory usage or just for struct rev_name instances
> and tip_name strings?

It's overall memory usage, the avarage of five runs of:

  /usr/bin/time --format='%M' ~/src/git/git name-rev --all

> And how much is that in absolute terms?  

git:     29801 ->  28514
linux:  317018 -> 332218
gcc:    106462 -> 114140
gecko:  315448 -> 344486
webkit:  55847 ->  62780
llvm:   112867 -> 134384

> (Perhaps
> it's worth it to get the memory ownership question off the table at
> least during the transformation to iterative processing.)

I looked into it only because I got curious, but other than that I
will definitely play the "beyond the scope of this patch series" card
:)

> > git.git is the exception with its unusually high number of merge
> > commits (about 25%), and the memory usage decresed by 4.4%.
> 
> Interesting.
> 
> I wonder why regular commits even need a struct name_rev.  Shouldn't
> only tips and roots need ones?  And perhaps merges and occasional
> regular "checkpoint" commits, to avoid too many duplicate traversals.

The 'struct rev_name' holds all info that's necessary to determine the
commit's name.  It seems to be much simpler to just attach one to each
commit and then retrieve it from the commit slab when printing the
name of the commit than to come up with an algorithm where only a
sleect set of commits get a 'struct rev_name', including how to access
those when naming a commit that doesn't have one.

> That's not exactly on-topic, though, and I didn't think all that
> deeply about it, but perhaps switching to a different marking
> strategy could get rid of recursion as a side-effect?  *waves hands
> vaguely*

I suppose a topo-order-based history walk should be able to name all
commits in a single traversal, and, consequently, be faster.  However,
'git rev-list --all --topo-order' doesn't seem to be that much faster
than 'git name-rev --all', so it might not be worth the effort.

> >  --- >8 ---
> >
> > diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> > index 6969af76c4..62ab78242b 100644
> > --- a/builtin/name-rev.c
> > +++ b/builtin/name-rev.c
> > @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
> >  		set_commit_rev_name(commit, name);
> >  		goto copy_data;
> >  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> > +		free((char*) name->tip_name);
> >  copy_data:
> >  		name->tip_name = tip_name;
> 
> I would have expected a xstrdup() call here.

But then we'd needed to release the results of all those xstrfmt()
calls at the callsites of create_or_update_name(), so instead of those
strdup() calls that you deem unnecessary we would need additional
free() calls.

> >  		name->taggerdate = taggerdate;
> > @@ -106,21 +107,19 @@ static void name_rev(struct commit *start_commit,
> >  {
> >  	struct commit_list *list = NULL;
> >  	const char *tip_name;
> > -	char *to_free = NULL;
> >
> >  	parse_commit(start_commit);
> >  	if (start_commit->date < cutoff)
> >  		return;
> >
> >  	if (deref) {
> > -		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
> > -		free((char*) start_tip_name);
> > +		tip_name = xstrfmt("%s^0", start_tip_name);
> >  	} else
> > -		tip_name = start_tip_name;
> > +		tip_name = strdup(start_tip_name);
> 
> This would not be needed with the central xstrdup() call mentioned above.
> 
> >
> >  	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
> >  				   from_tag)) {
> > -		free(to_free);
> > +		free((char*) tip_name);
> >  		return;
> >  	}
> >
> > @@ -139,7 +138,6 @@ static void name_rev(struct commit *start_commit,
> >  			struct commit *parent = parents->item;
> >  			const char *new_name;
> >  			int generation, distance;
> > -			const char *new_name_to_free = NULL;
> >
> >  			parse_commit(parent);
> >  			if (parent->date < cutoff)
> > @@ -159,11 +157,10 @@ static void name_rev(struct commit *start_commit,
> >  					new_name = xstrfmt("%.*s^%d", (int)len,
> >  							   name->tip_name,
> >  							   parent_number);
> > -				new_name_to_free = new_name;
> >  				generation = 0;
> >  				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
> >  			} else {
> > -				new_name = name->tip_name;
> > +				new_name = strdup(name->tip_name);
> 
> ... and neither would this.
> 
> Sure the xstrfmt() result would be duplicated instead of being reused, but
> that doesn't increase memory usage overall.
> 
> >  				generation = name->generation + 1;
> >  				distance = name->distance + 1;
> >  			}
> > @@ -174,7 +171,7 @@ static void name_rev(struct commit *start_commit,
> >  				last_new_parent = commit_list_append(parent,
> >  						  last_new_parent);
> >  			else
> > -				free((char*) new_name_to_free);
> > +				free((char*) new_name);
> >  		}
> >
> >  		*last_new_parent = list;
> > @@ -313,7 +310,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >  		if (taggerdate == TIME_MAX)
> >  			taggerdate = commit->date;
> >  		path = name_ref_abbrev(path, can_abbreviate_output);
> > -		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
> > +		name_rev(commit, path, taggerdate, from_tag, deref);
> >  	}
> >  	return 0;
> >  }
> >

  reply	other threads:[~2019-09-22 19:05 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
2019-09-19 21:46 ` [PATCH 01/15] t6120-describe: correct test repo history graph in comment SZEDER Gábor
2019-09-20 21:47   ` Junio C Hamano
2019-09-20 22:29     ` SZEDER Gábor
2019-09-28  4:06       ` Junio C Hamano
2019-09-19 21:46 ` [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
2019-09-20 21:49   ` Junio C Hamano
2019-09-19 21:46 ` [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name() SZEDER Gábor
2019-09-20 16:36   ` René Scharfe
2019-09-20 17:10     ` SZEDER Gábor
2019-09-19 21:46 ` [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
2019-09-20 16:37   ` René Scharfe
2019-09-19 21:47 ` [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
2019-09-20 15:11   ` Derrick Stolee
2019-09-20 15:40     ` SZEDER Gábor
2019-09-20 16:37   ` René Scharfe
2019-09-19 21:47 ` [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
2019-09-20 15:14   ` Derrick Stolee
2019-09-20 15:44     ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
2019-09-20 15:18   ` Derrick Stolee
2019-09-22  8:18   ` [PATCH] name-rev: rewrite create_or_update_name() Martin Ågren
2019-12-09 12:43     ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 08/15] name-rev: pull out deref handling from the recursion SZEDER Gábor
2019-09-20 15:21   ` Derrick Stolee
2019-09-20 17:42     ` SZEDER Gábor
2019-09-20 16:37   ` René Scharfe
2019-09-20 18:13     ` SZEDER Gábor
2019-09-20 18:14       ` SZEDER Gábor
2019-09-21  9:57         ` SZEDER Gábor
2019-09-21 12:37           ` René Scharfe
2019-09-22 19:05             ` SZEDER Gábor [this message]
2019-09-23 18:43               ` René Scharfe
2019-09-23 18:59                 ` SZEDER Gábor
2019-09-23 19:55                   ` René Scharfe
2019-09-23 20:47                     ` SZEDER Gábor
2019-09-24 17:03                       ` René Scharfe
2019-09-26 17:33                         ` SZEDER Gábor
2019-09-21 12:37       ` René Scharfe
2019-09-21 14:21         ` SZEDER Gábor
2019-09-21 15:52           ` René Scharfe
2019-09-19 21:47 ` [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
2019-09-21 12:37   ` René Scharfe
2019-09-19 21:47 ` [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
2019-09-20 15:27   ` Derrick Stolee
2019-09-20 17:09     ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 11/15] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
2019-09-19 21:47 ` [PATCH 12/15] name-rev: eliminate recursion in name_rev() SZEDER Gábor
2019-09-19 21:47 ` [PATCH 13/15] name-rev: cleanup name_ref() SZEDER Gábor
2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug a memory leak in name_rev() SZEDER Gábor
2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case SZEDER Gábor
2019-09-19 22:47   ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug a " SZEDER Gábor
2019-09-20 15:35   ` Derrick Stolee
2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug memory leak in name_rev() SZEDER Gábor
2019-09-19 22:48   ` SZEDER Gábor
2019-09-20 15:37 ` [PATCH 00/15] name-rev: eliminate recursion Derrick Stolee
2019-09-20 17:37   ` SZEDER Gábor
2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 01/13] t6120-describe: correct test repo history graph in comment SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
2019-11-27 18:02     ` Jonathan Tan
2019-11-12 10:38   ` [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
2019-11-12 19:02     ` René Scharfe
2019-11-12 10:38   ` [PATCH v2 04/13] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 05/13] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 06/13] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 07/13] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 08/13] name-rev: pull out deref handling from the recursion SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 09/13] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 10/13] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
2019-11-27 18:13     ` Jonathan Tan
2019-11-12 10:38   ` [PATCH v2 12/13] name-rev: eliminate recursion in name_rev() SZEDER Gábor
2019-11-27 17:57     ` Jonathan Tan
2019-12-09 12:22       ` SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 13/13] name-rev: cleanup name_ref() SZEDER Gábor
2019-11-27 18:01     ` Jonathan Tan
2019-12-09 12:32       ` SZEDER Gábor
2019-11-12 19:17   ` [PATCH v2 00/13] name-rev: eliminate recursion Johannes Schindelin
2019-11-13 19:25     ` Sebastiaan Dammann
2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 01/14] t6120-describe: correct test repo history graph in comment SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 02/14] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 03/14] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 04/14] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 05/14] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 06/14] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 07/14] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 08/14] name-rev: pull out deref handling from the recursion SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 09/14] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 10/14] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 11/14] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 12/14] name-rev: use 'name->tip_name' instead of 'tip_name' SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 13/14] name-rev: eliminate recursion in name_rev() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 14/14] name-rev: cleanup name_ref() SZEDER Gábor
2019-12-09 15:08     ` [PATCH v3 00/14] name-rev: eliminate recursion Derrick Stolee
2019-12-11 17:33       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190922190511.GB10866@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).