git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: fast-import's hash table is slow
Date: Wed, 1 Apr 2020 01:21:08 +0200	[thread overview]
Message-ID: <c0398484-15f4-e5c2-d229-82037094417c@web.de> (raw)
In-Reply-To: <fcf422e4-08f6-634a-39ba-18d40d1c25ca@web.de>

Am 31.03.20 um 21:14 schrieb René Scharfe:
> Am 31.03.20 um 11:45 schrieb Jeff King:
>> diff --git a/fast-import.c b/fast-import.c
>> index 202dda11a6..6ebac665a0 100644
>> --- a/fast-import.c
>> +++ b/fast-import.c
>> @@ -39,12 +39,25 @@
>>
>>  struct object_entry {
>>  	struct pack_idx_entry idx;
>> -	struct object_entry *next;
>>  	uint32_t type : TYPE_BITS,
>>  		pack_id : PACK_ID_BITS,
>>  		depth : DEPTH_BITS;
>>  };
>>
>> +static inline unsigned int object_entry_hash(struct object_entry *oe)
>> +{
>> +	return oidhash(&oe->idx.oid);
>> +}
>> +
>> +static inline int object_entry_equal(struct object_entry *a,
>> +				     struct object_entry *b)
>> +{
>> +	return oideq(&a->idx.oid, &b->idx.oid);
>> +}
>> +
>> +KHASH_INIT(object_entry_set, struct object_entry *, int, 0,
>> +	   object_entry_hash, object_entry_equal);
>> +
>>  struct object_entry_pool {
>>  	struct object_entry_pool *next_pool;
>>  	struct object_entry *next_free;
>> @@ -178,7 +191,7 @@ static off_t pack_size;
>>  /* Table of objects we've written. */
>>  static unsigned int object_entry_alloc = 5000;
>>  static struct object_entry_pool *blocks;
>> -static struct object_entry *object_table[1 << 16];
>> +static kh_object_entry_set_t object_table;
>>  static struct mark_set *marks;
>>  static const char *export_marks_file;
>>  static const char *import_marks_file;
>> @@ -455,44 +468,45 @@ static struct object_entry *new_object(struct object_id *oid)
>>
>>  static struct object_entry *find_object(struct object_id *oid)
>>  {
>> -	unsigned int h = oid->hash[0] << 8 | oid->hash[1];
>> -	struct object_entry *e;
>> -	for (e = object_table[h]; e; e = e->next)
>> -		if (oideq(oid, &e->idx.oid))
>> -			return e;
>> +	/*
>> +	 * this cast works because we only look at the oid part of the entry,
>> +	 * and it comes first in the struct
>> +	 */
>> +	khiter_t pos = kh_get_object_entry_set(&object_table,
>> +					       (struct object_entry *)oid);
>
> Dirty, but I can believe the comment.
>
>
>> +	if (pos != kh_end(&object_table))
>> +		return kh_key(&object_table, pos);
>>  	return NULL;
>>  }
>>
>>  static struct object_entry *insert_object(struct object_id *oid)
>>  {
>> -	unsigned int h = oid->hash[0] << 8 | oid->hash[1];
>> -	struct object_entry *e = object_table[h];
>> +	struct object_entry *e;
>> +	int was_empty;
>> +	khiter_t pos;
>>
>> -	while (e) {
>> -		if (oideq(oid, &e->idx.oid))
>> -			return e;
>> -		e = e->next;
>> -	}
>> +	pos = kh_put_object_entry_set(&object_table, (struct object_entry *)oid, &was_empty);
>
> Now this looks illegal.  khash is surely reading a full object_entry from oid,
> which only is a mere object_id, no?

No, it's a set of pointers, and khash only accesses the referenced objects
via the hash and comparison functions.

Storing the objects directly in the set and getting rid of new_object()
could improve performance due to reduced dereferencing overhead -- or
make it slower because more data has to be copied when the hashmap needs
to grow.  Worth a shot.  Later.

René

  reply	other threads:[~2020-03-31 23:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  9:45 fast-import's hash table is slow Jeff King
2020-03-31 19:14 ` René Scharfe
2020-03-31 23:21   ` René Scharfe [this message]
2020-04-01 10:24     ` Jeff King
2020-04-02 18:40       ` René Scharfe
2020-04-03 12:14         ` Jeff King
2020-04-01 10:35   ` Jeff King
2020-04-01 11:16     ` Jeff King
2020-04-02 18:40       ` René Scharfe
2020-04-03 12:12         ` Jeff King
2020-04-03 18:53           ` René Scharfe
2020-04-03 19:01             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0398484-15f4-e5c2-d229-82037094417c@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).