git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2] delta-islands: free island-related data after use
Date: Fri, 18 Nov 2022 02:51:34 +0100	[thread overview]
Message-ID: <221118.86zgcog2lp.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20221117230658.M516129@dcvr>


On Thu, Nov 17 2022, Eric Wong wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> On Wed, Nov 16 2022, Eric Wong wrote:
>> >   Memory savings were measured using the following patch which
>> >   relies on a patched LD_PRELOAD-based malloc debugger:
>> >   https://80x24.org/spew/20221116095404.3974691-1-e@80x24.org/raw
>> 
>> FWIW SANITIZE=leak will find this if you stick a "remote_islands = NULL"
>> and run e.g. t5320-delta-islands.sh, but maybe you needed this closer to
>> production.
>> 
>> Valgrind will also work, but of course be *much* slower.
>
> Yeah, I run that LD_PRELOAD thing in production since it's
> cheap compared to valgrind.
>
>> Perfect shouldn't be the enemy of the good & all that, but in this case
>> it's not too much more effort to just give this data an appropriate
>> lifetime instead of the global, I tried that out for just the "regex"
>> part of this below.
>> 
>> The free_remote_islands() seems to be similarly alive between
>> "find_island_for_ref" and "deduplicate_islands".
>> 
>> Your version also works, but the root cause of this sort of thing is
>> these global lifetimes, which sometimes we do for a good reason, but in
>> this case we don't.
>
> Agreed on all points.  Overall, the amount of globals in git has
> long seemed excessive and offputting to me (and likely other
> drive-by hackers).
>
>> diff --git a/delta-islands.c b/delta-islands.c
>> index 26f9e99e1a9..ef86a91059c 100644
>> --- a/delta-islands.c
>> +++ b/delta-islands.c
>> @@ -312,29 +312,41 @@ void resolve_tree_islands(struct repository *r,
>>  	free(todo);
>>  }
>>  
>> -static regex_t *island_regexes;
>> -static unsigned int island_regexes_alloc, island_regexes_nr;
>> +struct island_config_data {
>> +	regex_t *rx;
>> +	size_t nr;
>> +	size_t alloc;
>> +};
>
> I've added kh_str_t *remote_islands and renamed
> s/island_config_data/island_load_data/ in the below version
> to reflect the slightly different scope of remote_islands.
>
>>  static const char *core_island_name;
>>  
>> -static int island_config_callback(const char *k, const char *v, void *cb UNUSED)
>> +static void island_config_data_release(struct island_config_data *icd)
>> +{
>> +	for (size_t i = 0; i < icd->nr; i++)
>> +		regfree(&icd->rx[i]);
>> +	free(icd->rx);
>> +}
>
> icd => ild since config => load
>
>> +static int island_config_callback(const char *k, const char *v, void *cb)
>>  {
>> +	struct island_config_data *data = cb;
>> +
>
> data => ild
>
> I don't like the name `data' for a typed variable.

Hah! My thought process when deciding on it was "hrm, what *do* we call
the two variables when we have a void * and turn it into a 'util'?
data? cb? util? ... and which one was which?"

I started grepping, then decided I was wasting too much time on that for
a one-off reply, and just went with the first thing I found. The names
you picked are a lot better :)

> Aside from that, v2 below still frees the regex memory early on
> in the hopes deduplicate_islands() can reuse some of the freed
> regexp memory.
>
> Anyways, here's v2, which seems to work.  I'm still trying to
> figure out SATA errors+resets after replacing a CMOS battery,
> but I really hope this patch isn't the cause.
>
> -----8<-----
> From: Eric Wong <e@80x24.org>
> Subject: [PATCH] delta-islands: free island-related data after use
>
> On my use case involving 771 islands of Linux on kernel.org,
> this reduces memory usage by around 25MB.  The bulk of that
> comes from free_remote_islands, since free_config_regexes only
> saves around 40k.
>
> This memory is saved early in the memory-intensive pack process,
> making it available for the remainder of the long process.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

This all looks good to me, thanks a lot for the follow-up.

  reply	other threads:[~2022-11-18  7:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 10:50 [PATCH] delta-islands: free island-related data after use Eric Wong
2022-11-16 15:44 ` Ævar Arnfjörð Bjarmason
2022-11-17 23:06   ` [PATCH v2] " Eric Wong
2022-11-18  1:51     ` Ævar Arnfjörð Bjarmason [this message]
2022-11-22 20:22     ` Jeff King
2022-11-16 18:44 ` [PATCH] " Jeff King
2023-02-01  9:20   ` pack-objects memory use observations [was: [PATCH] delta-islands: free island-related data after use] Eric Wong
2023-02-01 22:09     ` Eric Wong
2023-02-02  0:11       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221118.86zgcog2lp.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).