From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: fast-import's hash table is slow
Date: Wed, 1 Apr 2020 01:21:08 +0200 [thread overview]
Message-ID: <c0398484-15f4-e5c2-d229-82037094417c@web.de> (raw)
In-Reply-To: <fcf422e4-08f6-634a-39ba-18d40d1c25ca@web.de>
Am 31.03.20 um 21:14 schrieb René Scharfe:
> Am 31.03.20 um 11:45 schrieb Jeff King:
>> diff --git a/fast-import.c b/fast-import.c
>> index 202dda11a6..6ebac665a0 100644
>> --- a/fast-import.c
>> +++ b/fast-import.c
>> @@ -39,12 +39,25 @@
>>
>> struct object_entry {
>> struct pack_idx_entry idx;
>> - struct object_entry *next;
>> uint32_t type : TYPE_BITS,
>> pack_id : PACK_ID_BITS,
>> depth : DEPTH_BITS;
>> };
>>
>> +static inline unsigned int object_entry_hash(struct object_entry *oe)
>> +{
>> + return oidhash(&oe->idx.oid);
>> +}
>> +
>> +static inline int object_entry_equal(struct object_entry *a,
>> + struct object_entry *b)
>> +{
>> + return oideq(&a->idx.oid, &b->idx.oid);
>> +}
>> +
>> +KHASH_INIT(object_entry_set, struct object_entry *, int, 0,
>> + object_entry_hash, object_entry_equal);
>> +
>> struct object_entry_pool {
>> struct object_entry_pool *next_pool;
>> struct object_entry *next_free;
>> @@ -178,7 +191,7 @@ static off_t pack_size;
>> /* Table of objects we've written. */
>> static unsigned int object_entry_alloc = 5000;
>> static struct object_entry_pool *blocks;
>> -static struct object_entry *object_table[1 << 16];
>> +static kh_object_entry_set_t object_table;
>> static struct mark_set *marks;
>> static const char *export_marks_file;
>> static const char *import_marks_file;
>> @@ -455,44 +468,45 @@ static struct object_entry *new_object(struct object_id *oid)
>>
>> static struct object_entry *find_object(struct object_id *oid)
>> {
>> - unsigned int h = oid->hash[0] << 8 | oid->hash[1];
>> - struct object_entry *e;
>> - for (e = object_table[h]; e; e = e->next)
>> - if (oideq(oid, &e->idx.oid))
>> - return e;
>> + /*
>> + * this cast works because we only look at the oid part of the entry,
>> + * and it comes first in the struct
>> + */
>> + khiter_t pos = kh_get_object_entry_set(&object_table,
>> + (struct object_entry *)oid);
>
> Dirty, but I can believe the comment.
>
>
>> + if (pos != kh_end(&object_table))
>> + return kh_key(&object_table, pos);
>> return NULL;
>> }
>>
>> static struct object_entry *insert_object(struct object_id *oid)
>> {
>> - unsigned int h = oid->hash[0] << 8 | oid->hash[1];
>> - struct object_entry *e = object_table[h];
>> + struct object_entry *e;
>> + int was_empty;
>> + khiter_t pos;
>>
>> - while (e) {
>> - if (oideq(oid, &e->idx.oid))
>> - return e;
>> - e = e->next;
>> - }
>> + pos = kh_put_object_entry_set(&object_table, (struct object_entry *)oid, &was_empty);
>
> Now this looks illegal. khash is surely reading a full object_entry from oid,
> which only is a mere object_id, no?
No, it's a set of pointers, and khash only accesses the referenced objects
via the hash and comparison functions.
Storing the objects directly in the set and getting rid of new_object()
could improve performance due to reduced dereferencing overhead -- or
make it slower because more data has to be copied when the hashmap needs
to grow. Worth a shot. Later.
René
next prev parent reply other threads:[~2020-03-31 23:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-31 9:45 fast-import's hash table is slow Jeff King
2020-03-31 19:14 ` René Scharfe
2020-03-31 23:21 ` René Scharfe [this message]
2020-04-01 10:24 ` Jeff King
2020-04-02 18:40 ` René Scharfe
2020-04-03 12:14 ` Jeff King
2020-04-01 10:35 ` Jeff King
2020-04-01 11:16 ` Jeff King
2020-04-02 18:40 ` René Scharfe
2020-04-03 12:12 ` Jeff King
2020-04-03 18:53 ` René Scharfe
2020-04-03 19:01 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c0398484-15f4-e5c2-d229-82037094417c@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).