git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jameson Miller <jamill@microsoft.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	"pclouds@gmail.com" <pclouds@gmail.com>,
	"jonathantanmy@google.com" <jonathantanmy@google.com>,
	"sbeller@google.com" <sbeller@google.com>,
	"peartben@gmail.com" <peartben@gmail.com>
Subject: RE: [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs
Date: Thu, 24 May 2018 14:47:58 +0000	[thread overview]
Message-ID: <BL0PR2101MB11060377AB5095E79EA9EA06CE6A0@BL0PR2101MB1106.namprd21.prod.outlook.com> (raw)
In-Reply-To: <xmqqefi1sliw.fsf@gitster-ct.c.googlers.com>



> -----Original Message-----
> From: Junio C Hamano <jch2355@gmail.com> On Behalf Of Junio C Hamano
> Sent: Thursday, May 24, 2018 12:52 AM
> To: Jameson Miller <jamill@microsoft.com>
> Cc: git@vger.kernel.org; pclouds@gmail.com; jonathantanmy@google.com;
> sbeller@google.com; peartben@gmail.com
> Subject: Re: [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry
> structs
> 
> Jameson Miller <jamill@microsoft.com> writes:
> 
> > Add an API around managing the lifetime of cache_entry structs.
> > Abstracting memory management details behind an API will allow for
> > alternative memory management strategies without affecting all the
> > call sites.  This commit does not change how memory is allocated /
> > freed. A later commit in this series will allocate cache entries from
> > memory pools as appropriate.
> >
> > Motivation:
> > It has been observed that the time spent loading an index with a large
> > number of entries is partly dominated by malloc() calls. This change
> > is in preparation for using memory pools to reduce the number of
> > malloc() calls made when loading an index.
> >
> > This API makes a distinction between cache entries that are intended
> > for use with a particular index and cache entries that are not. This
> > enables us to use the knowledge about how a cache entry will be used
> > to make informed decisions about how to handle the corresponding
> > memory.
> 
> Yuck.  make_index_cache_entry()?
> 
> Generally we use "cache" when working on the_index without passing istate,
> and otherwise "index", which means that readers can assume that
> distim_cache_entry(...)" is a shorter and more limited way to say
> "distim_index_entry(&the_index, ...)".  Having both index and cache in the same
> name smells crazy.
> 
> If most of the alocations are for permanent kind, give it a shorter name call it
> make_cache_entry(&the_index, ...), and call the other non-permanent one with
> a longer and more cumbersome name, perhaps
> make_transient_cache_entry(...).  Avoid saying "index" in the former name, as
> the design decision this series is making to allocate memory for a cache-entry
> from a pool associated to an index_state is already seen by what its first
> parameter is.

I like this suggestion - I will make this change in the next version of this series.

> 
> > diff --git a/cache.h b/cache.h
> > index f0a407602c..204f788438 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -339,6 +339,29 @@ extern void remove_name_hash(struct index_state
> > *istate, struct cache_entry *ce)  extern void free_name_hash(struct
> > index_state *istate);
> >
> >
> > +/* Cache entry creation and freeing */
> > +
> > +/*
> > + * Create cache_entry intended for use in the specified index. Caller
> > + * is responsible for discarding the cache_entry with
> > + * `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_index_cache_entry(struct index_state
> > +*istate, unsigned int mode, const unsigned char *sha1, const char
> > +*path, int stage, unsigned int refresh_options); extern struct
> > +cache_entry *make_empty_index_cache_entry(struct index_state *istate,
> > +size_t name_len);
> > +
> > +/*
> > + * Create a cache_entry that is not intended to be added to an index.
> > + * Caller is responsible for discarding the cache_entry
> > + * with `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_transient_cache_entry(unsigned int
> > +mode, const unsigned char *sha1, const char *path, int stage); extern
> > +struct cache_entry *make_empty_transient_cache_entry(size_t
> > +name_len);
> > +
> > +/*
> > + * Discard cache entry.
> > + */
> > +void discard_cache_entry(struct cache_entry *ce);
> 
> I am not yet convinced that it is a good idea to require each istate to hold a
> separate pool.  Anything that uses unpack_trees() can do "starting from this
> src_index, perform various mergy operations and deposit the result in
> dst_index".  Sometimes the two logical indices point at the same istate,
> sometimes different.  When src and dst are different istates, the code that used
> to simply add another pointer to the same ce to the dst index now needs to
> duplicate it out of the pool associated with dst?

I did not see any instances in unpack_trees() where it copied
just the cache_entry pointer from src to dst, but I will check
again.

You are correct, all the cache_entries need to be duplicated
before being added to the destination index, which is what I
think the code already does.  We tried to make this more
explicity by converting the inline xcalloc/memcpy instances to an
actual function.

In the existing code (before this patch series), the index
implicitly "owns" its cache_entry instances. This can be seen in
the discard_index function, where the index will discard any
cache entries that it has a reference to (that are not contained
in the split index). If there is code that just copies the
pointer to an unrelated index, then this cache entry would be
freed when the source index is freed (unless the cache entry is
removed from the src index).

With memory pools, the same pattern is followed, but this
relationship is a bit more explicit.

> 
> In any case, perhaps it will become clearer why it is a good idea as we read on,
> so let's do so.

Thank you for your review so far. I look forward to your thoughts on
the overall change and if it is a change you are interested in.

  reply	other threads:[~2018-05-24 14:48 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 16:34 [PATCH v1 0/5] Allocate cache entries from memory pool Jameson Miller
2018-04-17 16:34 ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-04-17 19:00   ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 2/5] Add an API creating / discarding cache_entry structs Jameson Miller
2018-04-17 23:11   ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-20 23:21   ` Jonathan Tan
2018-04-23 17:27     ` Jameson Miller
2018-04-23 17:49       ` Jonathan Tan
2018-04-23 18:20         ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 4/5] Allocate cache entries from memory pools Jameson Miller
2018-04-17 16:34 ` [PATCH v1 5/5] Add optional memory validations around cache_entry lifecyle Jameson Miller
2018-04-17 18:39 ` [PATCH v1 0/5] Allocate cache entries from memory pool Ben Peart
2018-04-23 14:09   ` Jameson Miller
2018-04-18  4:49 ` Junio C Hamano
2018-04-20 17:49   ` Stefan Beller
2018-04-23 16:44     ` Jameson Miller
2018-04-23 17:18       ` Stefan Beller
2018-04-23 16:19   ` Jameson Miller
2018-04-20 23:34 ` Jonathan Tan
2018-04-23 17:14   ` Jameson Miller
2018-04-30 15:31 ` [PATCH v2 " Jameson Miller
2018-04-30 15:31   ` [PATCH v2 1/5] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-04-30 15:31   ` [PATCH v2 2/5] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-04-30 15:31   ` [PATCH v2 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-30 21:42     ` Stefan Beller
2018-05-01 15:43       ` Jameson Miller
2018-05-03 16:18     ` Duy Nguyen
2018-04-30 15:31   ` [PATCH v2 4/5] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-04-30 15:31   ` [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-03 16:28     ` Duy Nguyen
2018-05-03 16:35   ` [PATCH v2 0/5] Allocate cache entries from memory pool Duy Nguyen
2018-05-03 17:21     ` Stefan Beller
2018-05-03 19:17       ` Duy Nguyen
2018-05-03 20:58         ` Stefan Beller
2018-05-03 21:13           ` Jameson Miller
2018-05-03 22:18             ` [PATCH] alloc.c: replace alloc by mempool Stefan Beller
2018-05-04 16:33               ` Duy Nguyen
2018-05-08  0:37                 ` Junio C Hamano
2018-05-08  0:44                   ` Stefan Beller
2018-05-08  1:07                     ` Junio C Hamano
2018-05-23 14:47 ` [PATCH v3 0/7] allocate cache entries from memory pool Jameson Miller
2018-05-23 14:47   ` [PATCH v3 1/7] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-05-25 22:54     ` Stefan Beller
2018-05-23 14:47   ` [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-05-24  4:52     ` Junio C Hamano
2018-05-24 14:47       ` Jameson Miller [this message]
2018-05-23 14:47   ` [PATCH v3 3/7] mem-pool: only search head block for available space Jameson Miller
2018-05-23 14:47   ` [PATCH v3 4/7] mem-pool: add lifecycle management functions Jameson Miller
2018-05-23 14:47   ` [PATCH v3 5/7] mem-pool: fill out functionality Jameson Miller
2018-06-01 19:28     ` Stefan Beller
2018-05-23 14:47   ` [PATCH v3 6/7] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-05-23 14:47   ` [PATCH v3 7/7] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-24  4:55   ` [PATCH v3 0/7] allocate cache entries from memory pool Junio C Hamano
2018-05-24 14:44     ` Jameson Miller
2018-05-25 22:53       ` Stefan Beller
2018-06-20 20:41         ` Jameson Miller
2018-05-25 22:41   ` Stefan Beller
2018-06-20 20:17 ` [PATCH v4 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17   ` [PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-20 20:17   ` [PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-21 21:14     ` Stefan Beller
2018-06-28 14:07       ` Jameson Miller
2018-06-20 20:17   ` [PATCH v4 3/8] mem-pool: only search head block for available space Jameson Miller
2018-06-21 21:33     ` Stefan Beller
2018-06-28 14:12       ` Jameson Miller
2018-06-20 20:17   ` [PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size Jameson Miller
2018-06-20 20:17   ` [PATCH v4 5/8] mem-pool: add lifecycle management functions Jameson Miller
2018-06-20 20:17   ` [PATCH v4 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-20 20:17   ` [PATCH v4 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17   ` [PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-06-28 14:00   ` [PATCH v5 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-28 14:00     ` [PATCH v5 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-28 14:00     ` [PATCH v5 2/8] read-cache: make_cache_entry should take object_id struct Jameson Miller
2018-06-28 17:14       ` Junio C Hamano
2018-06-28 22:27       ` SZEDER Gábor
2018-06-28 14:00     ` [PATCH v5 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-28 18:43       ` Junio C Hamano
2018-06-28 22:28       ` SZEDER Gábor
2018-06-28 14:00     ` [PATCH v5 4/8] mem-pool: only search head block for available space Jameson Miller
2018-06-28 14:00     ` [PATCH v5 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-06-28 17:15       ` Junio C Hamano
2018-06-28 14:00     ` [PATCH v5 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-28 19:09       ` Junio C Hamano
2018-07-02 18:28         ` Jameson Miller
2018-06-28 14:00     ` [PATCH v5 7/8] block alloc: allocate cache entries from mem-pool Jameson Miller
2018-06-28 14:00     ` [PATCH v5 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-07-02 19:49     ` [PATCH v6 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49       ` [PATCH v6 1/8] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-07-02 19:49       ` [PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id Jameson Miller
2018-07-02 21:23         ` Stefan Beller
2018-07-05 15:20           ` Jameson Miller
2018-07-02 19:49       ` [PATCH v6 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-07-22  9:23         ` Duy Nguyen
2018-07-02 19:49       ` [PATCH v6 4/8] mem-pool: only search head block for available space Jameson Miller
2018-07-02 19:49       ` [PATCH v6 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-07-02 19:49       ` [PATCH v6 6/8] mem-pool: fill out functionality Jameson Miller
2018-07-02 19:49       ` [PATCH v6 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49       ` [PATCH v6 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR2101MB11060377AB5095E79EA9EA06CE6A0@BL0PR2101MB1106.namprd21.prod.outlook.com \
    --to=jamill@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).