Re: [PATCH 1/2] fast-import: use struct hash_table for atom strings

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Jonathan Nieder <jrnieder@gmail.com>
To: David Barr <david.barr@cordelta.com>
Cc: Git List <git@vger.kernel.org>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Stephen Boyd <bebarino@gmail.com>
Subject: Re: [PATCH 1/2] fast-import: use struct hash_table for atom strings
Date: Fri, 1 Apr 2011 21:42:09 -0500	[thread overview]
Message-ID: <20110402024209.GA6039@elie> (raw)
In-Reply-To: <1301572798-9973-2-git-send-email-david.barr@cordelta.com>

Hi,

David Barr wrote:

> Signed-off-by: David Barr <david.barr@cordelta.com>

Thanks, this is a welcome change.  But perhaps it would be nice to
explain why, here? :)

E.g., what is stored in the atom table? does it tend to get big?  does
the existing code allow it to grow? this change will allow it to grow,
right? what is the downside to this change (if any)?

Especially, numbers (timings) illustrating the effect on typical
use and effect on scalability would be interesting.

> ---
>  fast-import.c |   17 ++++++++++-------
>  1 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/fast-import.c b/fast-import.c
> index 65d65bf..0592b21 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -300,9 +300,8 @@ static size_t total_allocd;
>  static struct mem_pool *mem_pool;
>  
>  /* Atom management */
> -static unsigned int atom_table_sz = 4451;
>  static unsigned int atom_cnt;
> -static struct atom_str **atom_table;
> +static struct hash_table atom_table;
>  
>  /* The .pack file being generated */
>  static unsigned int pack_id;
> @@ -680,10 +679,11 @@ static struct object_entry *find_mark(uintmax_t idnum)
>  
>  static struct atom_str *to_atom(const char *s, unsigned short len)
>  {
> -	unsigned int hc = hc_str(s, len) % atom_table_sz;
> +	unsigned int hc = hc_str(s, len);
>  	struct atom_str *c;
> +	void **pos;
>  
> -	for (c = atom_table[hc]; c; c = c->next_atom)
> +	for (c = lookup_hash(hc, &atom_table); c; c = c->next_atom)
>  		if (c->str_len == len && !strncmp(s, c->str_dat, len))
>  			return c;
>  
> @@ -691,8 +691,12 @@ static struct atom_str *to_atom(const char *s, unsigned short len)
>  	c->str_len = len;
>  	strncpy(c->str_dat, s, len);
>  	c->str_dat[len] = 0;
> -	c->next_atom = atom_table[hc];
> -	atom_table[hc] = c;
> +	c->next_atom = NULL;
> +	pos = insert_hash(hc, c, &atom_table);
> +	if (pos) {
> +		c->next_atom = *pos;
> +		*pos = c;
> +	}

If I understand correctly, this puts new atoms at the start of the
chain, just like v1.7.4-rc0~40^2 (fast-import: insert new object
entries at start of hash bucket, 2010-11-23) did for objects.  Did you
measure and find this faster, or is it just for simplicity or
consistency?  (I'd personally be fine with it either way, but it seems
prudent to ask.)

>  	atom_cnt++;
>  	return c;
>  }
> @@ -3263,7 +3267,6 @@ int main(int argc, const char **argv)
>  
>  	alloc_objects(object_entry_alloc);
>  	strbuf_init(&command_buf, 0);
> -	atom_table = xcalloc(atom_table_sz, sizeof(struct atom_str*));
>  	branch_table = xcalloc(branch_table_sz, sizeof(struct branch*));
>  	avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct avail_tree_content*));
>  	marks = pool_calloc(1, sizeof(struct mark_set));

We never call init_hash.  That's technically safe because init_hash
just zeroes out the table, but I think I'd rather see us using it
anyway or documenting in api-hash.txt that it's safe not to use.

Looks good.  Will queue to give it some testing.

next prev parent reply	other threads:[~2011-04-02  2:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-31 11:59 fast-import: use struct hash_table David Barr
2011-03-31 11:59 ` [PATCH 1/2] fast-import: use struct hash_table for atom strings David Barr
2011-04-02  2:42   ` Jonathan Nieder [this message]
2011-04-02  3:33     ` Jonathan Nieder
2011-03-31 11:59 ` [PATCH 2/2] fast-import: use struct hash_table for objects David Barr
2011-04-02  2:46   ` Jonathan Nieder
2011-04-02  2:48 ` fast-import: use struct hash_table Jonathan Nieder
2012-04-11 12:11 ` [PATCH/RFC v2 0/4] " Jonathan Nieder
2012-04-11 12:12 ` [PATCH/RFC v2 0/4 resend] " Jonathan Nieder
2012-04-11 12:13   ` [PATCH 1/4] fast-import: allow object_table to grow dynamically Jonathan Nieder
2012-04-11 12:14   ` [PATCH 2/4] fast-import: allow atom_table " Jonathan Nieder
2012-04-11 12:15   ` [PATCH 3/4] fast-import: allow branch_table " Jonathan Nieder
2012-04-11 12:15   ` [PATCH 4/4] fast-import: use DIV_ROUND_UP Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110402024209.GA6039@elie \
    --to=jrnieder@gmail.com \
    --cc=bebarino@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).