git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jameson Miller <jamill@microsoft.com>
Cc: git@vger.kernel.org, peff@peff.net
Subject: Re: [PATCH v2 2/5] fast-import: introduce mem_pool type
Date: Fri, 23 Mar 2018 10:15:34 -0700	[thread overview]
Message-ID: <xmqqk1u2k91l.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180323144408.213145-3-jamill@microsoft.com> (Jameson Miller's message of "Fri, 23 Mar 2018 10:44:05 -0400")

Jameson Miller <jamill@microsoft.com> writes:

> Introduce the mem_pool type and wrap the existing mp_block in this new
> type. The new mem_pool type will allow for the memory pool logic to be
> reused outside of fast-import. This type will be moved into its own file
> in a future commit.
>
> Signed-off-by: Jameson Miller <jamill@microsoft.com>
> ---
>  fast-import.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 89 insertions(+), 19 deletions(-)
>
> diff --git a/fast-import.c b/fast-import.c
> index 6c3215d7c3..1262d9e6be 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -216,6 +216,19 @@ struct mp_block {
>  	uintmax_t space[FLEX_ARRAY]; /* more */
>  };
>  
> +struct mem_pool {
> +	struct mp_block *mp_block;
> +
> +	/*
> +	 * The amount of available memory to grow the pool by.
> +	 * This size does not include the overhead for the mp_block.
> +	 */
> +	size_t block_alloc;
> +
> +	/* The total amount of memory allocated by the pool. */
> +	size_t pool_alloc;
> +};

OK, so the existing fast-import one knows that it only needs a
single instance, but by introducing this structure, we made it
possible to have more than one instance of it, and more importantly,
all the necessary pieces of info to manage it are available to those
who have a pointer to this instance.

>  struct atom_str {
>  	struct atom_str *next_atom;
>  	unsigned short str_len;
> @@ -304,9 +317,7 @@ static int global_argc;
>  static const char **global_argv;
>  
>  /* Memory pools */
> -static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
> -static size_t total_allocd;
> -static struct mp_block *mp_block_head;
> +static struct mem_pool fi_mem_pool =  {0, 2*1024*1024 - sizeof(struct mp_block), 0 };

And this struct instance represents the <list of blocks, block_alloc
and pool_alloc> for fast-import's use, which used to be separate
global variables that are file-scope static here.

Good.

>  /* Atom management */
>  static unsigned int atom_table_sz = 4451;
> @@ -324,6 +335,7 @@ static off_t pack_size;
>  /* Table of objects we've written. */
>  static unsigned int object_entry_alloc = 5000;
>  static struct object_entry_pool *blocks;
> +static size_t total_allocd = 0;

This one is now used as a total across all the mempool instances and
is kept outside the struct, which is OK.

Do not initialize a var in BSS with an explicit "= 0".

> @@ -634,6 +646,60 @@ static unsigned int hc_str(const char *s, size_t len)
>  	return r;
>  }
>  
> +static struct mp_block *pool_alloc_block()

Should spell this as:

	static struct mp_block *pool_alloc_block(void)

But it is somewhat curious; if we are moving to make this reusable,
shouldn't this be accepting "struct mem_pool *pool" from the caller
and using pool->pool_alloc, pool->block_alloc, etc. instead of
always using the global instance, which fi_mem_pool is?

> +{
> +	struct mp_block *p;
> +
> +	fi_mem_pool.pool_alloc += sizeof(struct mp_block) + fi_mem_pool.block_alloc;

This addition, just like the addition on the next line, potentially
overflows and wraps-around, so it is probably a candidate for
st_add(), but IIUC, pool_alloc is only for usage stats, so it may
not be too bad an offence.  In any case, the above is done as close
to the original and is good for a refactoring patch.

> +	p = xmalloc(st_add(sizeof(struct mp_block), fi_mem_pool.block_alloc));
> +	p->next_block = fi_mem_pool.mp_block;
> +	p->next_free = (char *)p->space;
> +	p->end = p->next_free + fi_mem_pool.block_alloc;
> +	fi_mem_pool.mp_block = p;
> +
> +	return p;
> +}

So this is one half of the original pool_alloc(), that allocated a
new block and queued it to the pool when a not-too-large allocation
is requested (the other half that dealt with large allocations
bypassed the pool mechanism altogether).

> +/*
> + * Allocates a block of memory with a specific size and
> + * appends it to the memory pool's list of blocks.
> + *
> + * This function is used to allocate blocks with sizes
> + * different than the default "block_alloc" size for the mem_pool.
> + *
> + * There are two use cases:
> + *  1) The initial block allocation for a memory pool.
> + *
> + *  2) large" blocks of a specific size, where the entire memory block
> + *     is going to be used. This means the block will not have any
> + *     available memory for future allocations. The block is placed at
> + *     the end of the list so that it will be the last block searched
> + *     for available space.

If we know the block won't have any leftover space, what's the point
of "searching for available space" in it in the first place, even if
you make it the last?

> + */
> +static struct mp_block *pool_alloc_block_with_size(size_t block_alloc)
> +{

Same comment on passing a "pool" instance as parameter applies to
this function.

> +	struct mp_block *p, *block;
> +
> +	fi_mem_pool.pool_alloc += sizeof(struct mp_block) + block_alloc;
> +	p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
> +
> +	block = fi_mem_pool.mp_block;
> +	if (block) {
> +		while (block->next_block)
> +			block = block->next_block;
> +
> +		block->next_block = p;

If there were a need to append to the end (and also a need to
iterate from the beginning), keep an extra pointer in the "pool"
instance to make it an O(1) operation.  Or perhaps this list can be
managed with list.h?

It is unclear why this and pool_alloc_block() need to exist as
separate functions.  80% of the code are shared between them.

> +	} else {
> +		fi_mem_pool.mp_block = p;
> +	}
> +
> +	p->next_block = NULL;
> +	p->next_free = (char *)p->space;
> +	p->end = p->next_free + block_alloc;
> +
> +	return p;
> +}
> +
>  static void *pool_alloc(size_t len)
>  {
>  	struct mp_block *p;
> @@ -643,21 +709,25 @@ static void *pool_alloc(size_t len)
>  	if (len & (sizeof(uintmax_t) - 1))
>  		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
>  
> -	for (p = mp_block_head; p; p = p->next_block)
> -		if ((p->end - p->next_free >= len))
> -			break;
> +	p = fi_mem_pool.mp_block;
> +
> +	/*
> +	 * In performance profiling, there was a minor perf benefit to
> +	 * check for available memory in the head block via the if
> +	 * statement, and only going through the loop when needed.
> +	 */

Don't mix premature optimization in a refactoring patch, please.

> +	if (p &&
> +	   (p->end - p->next_free < len)) {
> +		for (p = p->next_block; p; p = p->next_block)
> +			if (p->end - p->next_free >= len)
> +				break;
> +	}
>  
>  	if (!p) {
> -		if (len >= (mem_pool_alloc/2)) {
> -			total_allocd += len;
> -			return xmalloc(len);
> -		}
> -		total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
> -		p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
> -		p->next_block = mp_block_head;
> -		p->next_free = (char *) p->space;
> -		p->end = p->next_free + mem_pool_alloc;
> -		mp_block_head = p;
> +		if (len >= (fi_mem_pool.block_alloc / 2))
> +			p = pool_alloc_block_with_size(len);
> +		else
> +			p = pool_alloc_block();

So we used to fulfill large request directly from xmalloc() but now
we create a new block, which is large enough to fulfill the request
but otherwise unusable for helping later requests, and queue that to
the mem_pool structure.

It _might_ be a justifiable change in behaviour, but is entirely
unexpected from a refactoring patch that wants to "wrap the existing
mp_block in this new type", especially without justifying why in the
proposed log message.

  reply	other threads:[~2018-03-23 17:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 16:41 [PATCH 0/3] Extract memory pool logic into reusable component jameson.miller81
2018-03-21 16:41 ` [PATCH 1/3] fast-import: rename mem_pool to fi_mem_pool jameson.miller81
2018-03-21 16:41 ` [PATCH 2/3] Introduce a reusable memory pool type jameson.miller81
2018-03-21 16:41 ` [PATCH 3/3] fast-import: use built-in mem pool jameson.miller81
2018-03-21 19:27 ` [PATCH 0/3] Extract memory pool logic into reusable component Junio C Hamano
2018-03-23 14:44 ` [PATCH v2 " Jameson Miller
2018-03-23 14:44 ` [PATCH v2 1/5] fast-import: rename mem_pool type to mp_block Jameson Miller
2018-03-23 16:42   ` Junio C Hamano
2018-03-23 14:44 ` [PATCH v2 2/5] fast-import: introduce mem_pool type Jameson Miller
2018-03-23 17:15   ` Junio C Hamano [this message]
2018-03-23 14:44 ` [PATCH v2 3/5] fast-import: update pool_* functions to work on local pool Jameson Miller
2018-03-23 17:19   ` Junio C Hamano
2018-03-23 14:44 ` [PATCH v2 4/5] Move the reusable parts of memory pool into its own file Jameson Miller
2018-03-23 20:27   ` Junio C Hamano
2018-03-23 14:44 ` [PATCH v2 5/5] Expand implementation of mem-pool type Jameson Miller
2018-03-23 20:41   ` Junio C Hamano
2018-03-26 17:03 ` [PATCH v3 0/3] Extract memory pool logic into reusable component Jameson Miller
2018-03-26 17:03 ` [PATCH v3 1/3] fast-import: rename mem_pool type to mp_block Jameson Miller
2018-03-26 17:03 ` [PATCH v3 2/3] fast-import: introduce mem_pool type Jameson Miller
2018-03-26 17:34   ` Eric Sunshine
2018-03-27 16:09   ` Junio C Hamano
2018-03-26 17:03 ` [PATCH v3 3/3] Move reusable parts of memory pool into its own file Jameson Miller
2018-03-27 16:43   ` Junio C Hamano
2018-03-29 14:12     ` Jameson Miller
2018-04-11 18:37 ` [PATCH v4 0/3] Extract memory pool logic into reusable component Jameson Miller
2018-04-17 16:43   ` Jameson Miller
2018-04-11 18:37 ` [PATCH v4 1/3] fast-import: rename mem_pool type to mp_block Jameson Miller
2018-04-11 18:37 ` [PATCH v4 2/3] fast-import: introduce mem_pool type Jameson Miller
2018-04-11 18:37 ` [PATCH v4 3/3] Move reusable parts of memory pool into its own file Jameson Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk1u2k91l.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jamill@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).