git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Jameson Miller <jamill@microsoft.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Cc: "gitster@pobox.com" <gitster@pobox.com>,
	"pclouds@gmail.com" <pclouds@gmail.com>,
	"jonathantanmy@google.com" <jonathantanmy@google.com>
Subject: Re: [PATCH v1 2/5] Add an API creating / discarding cache_entry structs
Date: Tue, 17 Apr 2018 19:11:59 -0400	[thread overview]
Message-ID: <921a8d5c-b118-c5b1-35dd-459130faeed9@gmail.com> (raw)
In-Reply-To: <20180417163400.3875-4-jamill@microsoft.com>



On 4/17/2018 12:34 PM, Jameson Miller wrote:
> Add an API around managing the lifetime of cache_entry structs. Abstracting
> memory management details behind an API will allow for alternative memory
> management strategies without affecting all the call sites.  This commit does
> not change how memory is allocated / freed. A later commit in this series will
> allocate cache entries from memory pools as appropriate.
> 
> Motivation:
> It has been observed that the time spent loading an index with a large
> number of entries is partly dominated by malloc() calls. This
> change is in preparation for using memory pools to reduce the number
> of malloc() calls made when loading an index.
> 
> This API makes a distinction between cache entries that are intended for use
> with a particular to an index and cache entries that are not. 

The wording here is awkward.  Did you mean "intended for use with a 
particular index?"

This enables us
> to use the knowledge about how a cache entry will be used to make informed
> decisions about how to handle the corresponding memory.
> ---
>   apply.c                |  26 ++++++------
>   blame.c                |   5 +--
>   builtin/checkout.c     |   8 ++--
>   builtin/difftool.c     |   8 ++--
>   builtin/reset.c        |   6 +--
>   builtin/update-index.c |  26 ++++++------
>   cache.h                |  29 +++++++++++++-
>   merge-recursive.c      |   2 +-
>   read-cache.c           | 105 +++++++++++++++++++++++++++++++++++--------------
>   resolve-undo.c         |   6 ++-
>   split-index.c          |   8 ++--
>   tree.c                 |   4 +-
>   unpack-trees.c         |  27 ++++++++-----
>   13 files changed, 166 insertions(+), 94 deletions(-)
> 
> diff --git a/apply.c b/apply.c
> index 7e5792c996..47903f427b 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state *state, struct patch *list)
>   			return error(_("sha1 information is lacking or useless "
>   				       "(%s)."), name);
>   
> -		ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
> +		ce = make_index_cache_entry(&result, patch->old_mode, oid.hash, name, 0, 0);
>   		if (!ce)
> -			return error(_("make_cache_entry failed for path '%s'"),
> +			return error(_("make_index_cache_entry failed for path '%s'"),
>   				     name);
>   		if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
> -			free(ce);
> +			index_cache_entry_discard(ce);

I personally prefer name symmetry.  To me, make_index_cache_entry() 
should be paired with discard_index_cache_entry().

The rest of this patch looks like a fairly mechanical refactoring with 
the biggest exception being the difference between the the 
*_index_cache_entry() APIs and the *_transient_cache_entry() APIs.

There are quite a few changes but I didn't see any instances that were 
missed or any errors.  I see that later patches will put verification 
code in place to detect if any were done incorrectly and to prevent 
regressions moving forward.

Overall, it looks correct and reasonable.

<snip>

>   
> -struct cache_entry *make_cache_entry(unsigned int mode,
> -		const unsigned char *sha1, const char *path, int stage,
> -		unsigned int refresh_options)
> +struct cache_entry *make_empty_index_cache_entry(struct index_state *istate, size_t len)
> +{
> +	return xcalloc(1, cache_entry_size(len));
> +}
> +
> +struct cache_entry *make_empty_transient_cache_entry(size_t len)
> +{
> +	return xcalloc(1, cache_entry_size(len));
> +}
> +
> +struct cache_entry *make_index_cache_entry(struct index_state *istate, unsigned int mode,
> +			    const unsigned char *sha1, const char *path,
> +			    int stage, unsigned int refresh_options)
>   {
> -	int size, len;
>   	struct cache_entry *ce, *ret;
> +	int len;
>   
>   	if (!verify_path(path)) {
>   		error("Invalid path '%s'", path);
> @@ -758,8 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
>   	}
>   
>   	len = strlen(path);
> -	size = cache_entry_size(len);
> -	ce = xcalloc(1, size);
> +	ce = make_empty_index_cache_entry(istate, len);
>   
>   	hashcpy(ce->oid.hash, sha1);
>   	memcpy(ce->name, path, len);
> @@ -769,10 +777,34 @@ struct cache_entry *make_cache_entry(unsigned int mode,
>   
>   	ret = refresh_cache_entry(&the_index, ce, refresh_options);
>   	if (ret != ce)
> -		free(ce);
> +		index_cache_entry_discard(ce);
> +
>   	return ret;
>   }
>   
> +struct cache_entry *make_transient_cache_entry(unsigned int mode, const unsigned char *sha1,
> +			   const char *path, int stage)
> +{
> +	struct cache_entry *ce;
> +	int len;
> +
> +	if (!verify_path(path)) {
> +		error("Invalid path '%s'", path);
> +		return NULL;
> +	}
> +
> +	len = strlen(path);
> +	ce = make_empty_transient_cache_entry(len);
> +
> +	hashcpy(ce->oid.hash, sha1);
> +	memcpy(ce->name, path, len);
> +	ce->ce_flags = create_ce_flags(stage);
> +	ce->ce_namelen = len;
> +	ce->ce_mode = create_ce_mode(mode);
> +

Nit, feel free to ignore. There isn't a lot of initialization here but I 
wonder if it makes sense to have an internal helper function to ensure 
these stay the same.

> +	return ce;
> +}
> +
>   /*
>    * Chmod an index entry with either +x or -x.
>    *
> @@ -1243,7 +1275,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
>   {
>   	struct stat st;
>   	struct cache_entry *updated;
> -	int changed, size;
> +	int changed;
>   	int refresh = options & CE_MATCH_REFRESH;
>   	int ignore_valid = options & CE_MATCH_IGNORE_VALID;
>   	int ignore_skip_worktree = options & CE_MATCH_IGNORE_SKIP_WORKTREE;
> @@ -1323,8 +1355,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
>   		return NULL;
>   	}
>   
> -	size = ce_size(ce);
> -	updated = xmalloc(size);
> +	updated = make_empty_index_cache_entry(istate, ce_namelen(ce));
>   	copy_cache_entry(updated, ce);
>   	memcpy(updated->name, ce->name, ce->ce_namelen + 1);
>   	fill_stat_cache_info(updated, &st);
> @@ -1610,12 +1641,13 @@ int read_index(struct index_state *istate)
>   	return read_index_from(istate, get_index_file(), get_git_dir());
>   }
>   
> -static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
> +static struct cache_entry *cache_entry_from_ondisk(struct index_state *istate,
> +						   struct ondisk_cache_entry *ondisk,
>   						   unsigned int flags,
>   						   const char *name,
>   						   size_t len)
>   {
> -	struct cache_entry *ce = xmalloc(cache_entry_size(len));
> +	struct cache_entry *ce = make_empty_index_cache_entry(istate, len);
>   
>   	ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
>   	ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
> @@ -1657,7 +1689,8 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
>   	return (const char *)ep + 1 - cp_;
>   }
>   
> -static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
> +static struct cache_entry *create_from_disk(struct index_state *istate,
> +					    struct ondisk_cache_entry *ondisk,
>   					    unsigned long *ent_size,
>   					    struct strbuf *previous_name)

Nit. Just wondering why you pulled refresh_cache_entry() out into an 
earlier/separate commit but then did create_from_disk() as part of the 
large refactoring focused on adding the create/discard APIs?

<snip>

> diff --git a/unpack-trees.c b/unpack-trees.c
> index e73745051e..232cdecc72 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -192,10 +192,10 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
>   			       ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
>   }
>   
> -static struct cache_entry *dup_entry(const struct cache_entry *ce)
> +static struct cache_entry *dup_entry(const struct cache_entry *ce, struct index_state *istate)

Ditto with the dup_entry() and create_ce_entry() functions refactoring.

> -static struct cache_entry *create_ce_entry(const struct traverse_info *info, const struct name_entry *n, int stage)
> +static struct cache_entry *create_ce_entry(const struct traverse_info *info,
> +	const struct name_entry *n,
> +	int stage,
> +	struct index_state *istate,
> +	int is_transient)

  reply	other threads:[~2018-04-17 23:12 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 16:34 [PATCH v1 0/5] Allocate cache entries from memory pool Jameson Miller
2018-04-17 16:34 ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-04-17 19:00   ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 2/5] Add an API creating / discarding cache_entry structs Jameson Miller
2018-04-17 23:11   ` Ben Peart [this message]
2018-04-17 16:34 ` [PATCH v1 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-20 23:21   ` Jonathan Tan
2018-04-23 17:27     ` Jameson Miller
2018-04-23 17:49       ` Jonathan Tan
2018-04-23 18:20         ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 4/5] Allocate cache entries from memory pools Jameson Miller
2018-04-17 16:34 ` [PATCH v1 5/5] Add optional memory validations around cache_entry lifecyle Jameson Miller
2018-04-17 18:39 ` [PATCH v1 0/5] Allocate cache entries from memory pool Ben Peart
2018-04-23 14:09   ` Jameson Miller
2018-04-18  4:49 ` Junio C Hamano
2018-04-20 17:49   ` Stefan Beller
2018-04-23 16:44     ` Jameson Miller
2018-04-23 17:18       ` Stefan Beller
2018-04-23 16:19   ` Jameson Miller
2018-04-20 23:34 ` Jonathan Tan
2018-04-23 17:14   ` Jameson Miller
2018-04-30 15:31 ` [PATCH v2 " Jameson Miller
2018-04-30 15:31   ` [PATCH v2 1/5] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-04-30 15:31   ` [PATCH v2 2/5] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-04-30 15:31   ` [PATCH v2 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-30 21:42     ` Stefan Beller
2018-05-01 15:43       ` Jameson Miller
2018-05-03 16:18     ` Duy Nguyen
2018-04-30 15:31   ` [PATCH v2 4/5] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-04-30 15:31   ` [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-03 16:28     ` Duy Nguyen
2018-05-03 16:35   ` [PATCH v2 0/5] Allocate cache entries from memory pool Duy Nguyen
2018-05-03 17:21     ` Stefan Beller
2018-05-03 19:17       ` Duy Nguyen
2018-05-03 20:58         ` Stefan Beller
2018-05-03 21:13           ` Jameson Miller
2018-05-03 22:18             ` [PATCH] alloc.c: replace alloc by mempool Stefan Beller
2018-05-04 16:33               ` Duy Nguyen
2018-05-08  0:37                 ` Junio C Hamano
2018-05-08  0:44                   ` Stefan Beller
2018-05-08  1:07                     ` Junio C Hamano
2018-05-23 14:47 ` [PATCH v3 0/7] allocate cache entries from memory pool Jameson Miller
2018-05-23 14:47   ` [PATCH v3 1/7] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-05-25 22:54     ` Stefan Beller
2018-05-23 14:47   ` [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-05-24  4:52     ` Junio C Hamano
2018-05-24 14:47       ` Jameson Miller
2018-05-23 14:47   ` [PATCH v3 3/7] mem-pool: only search head block for available space Jameson Miller
2018-05-23 14:47   ` [PATCH v3 4/7] mem-pool: add lifecycle management functions Jameson Miller
2018-05-23 14:47   ` [PATCH v3 5/7] mem-pool: fill out functionality Jameson Miller
2018-06-01 19:28     ` Stefan Beller
2018-05-23 14:47   ` [PATCH v3 6/7] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-05-23 14:47   ` [PATCH v3 7/7] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-24  4:55   ` [PATCH v3 0/7] allocate cache entries from memory pool Junio C Hamano
2018-05-24 14:44     ` Jameson Miller
2018-05-25 22:53       ` Stefan Beller
2018-06-20 20:41         ` Jameson Miller
2018-05-25 22:41   ` Stefan Beller
2018-06-20 20:17 ` [PATCH v4 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17   ` [PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-20 20:17   ` [PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-21 21:14     ` Stefan Beller
2018-06-28 14:07       ` Jameson Miller
2018-06-20 20:17   ` [PATCH v4 3/8] mem-pool: only search head block for available space Jameson Miller
2018-06-21 21:33     ` Stefan Beller
2018-06-28 14:12       ` Jameson Miller
2018-06-20 20:17   ` [PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size Jameson Miller
2018-06-20 20:17   ` [PATCH v4 5/8] mem-pool: add lifecycle management functions Jameson Miller
2018-06-20 20:17   ` [PATCH v4 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-20 20:17   ` [PATCH v4 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17   ` [PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-06-28 14:00   ` [PATCH v5 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-28 14:00     ` [PATCH v5 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-28 14:00     ` [PATCH v5 2/8] read-cache: make_cache_entry should take object_id struct Jameson Miller
2018-06-28 17:14       ` Junio C Hamano
2018-06-28 22:27       ` SZEDER Gábor
2018-06-28 14:00     ` [PATCH v5 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-28 18:43       ` Junio C Hamano
2018-06-28 22:28       ` SZEDER Gábor
2018-06-28 14:00     ` [PATCH v5 4/8] mem-pool: only search head block for available space Jameson Miller
2018-06-28 14:00     ` [PATCH v5 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-06-28 17:15       ` Junio C Hamano
2018-06-28 14:00     ` [PATCH v5 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-28 19:09       ` Junio C Hamano
2018-07-02 18:28         ` Jameson Miller
2018-06-28 14:00     ` [PATCH v5 7/8] block alloc: allocate cache entries from mem-pool Jameson Miller
2018-06-28 14:00     ` [PATCH v5 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-07-02 19:49     ` [PATCH v6 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49       ` [PATCH v6 1/8] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-07-02 19:49       ` [PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id Jameson Miller
2018-07-02 21:23         ` Stefan Beller
2018-07-05 15:20           ` Jameson Miller
2018-07-02 19:49       ` [PATCH v6 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-07-22  9:23         ` Duy Nguyen
2018-07-02 19:49       ` [PATCH v6 4/8] mem-pool: only search head block for available space Jameson Miller
2018-07-02 19:49       ` [PATCH v6 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-07-02 19:49       ` [PATCH v6 6/8] mem-pool: fill out functionality Jameson Miller
2018-07-02 19:49       ` [PATCH v6 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49       ` [PATCH v6 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=921a8d5c-b118-c5b1-35dd-459130faeed9@gmail.com \
    --to=peartben@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jamill@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).