git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: martin.agren@gmail.com, git@vger.kernel.org,
	jeffhost@microsoft.com, gitster@pobox.com, peff@peff.net
Subject: Re: [PATCH] hashmap: add API to disable item counting when threaded
Date: Tue, 5 Sep 2017 12:33:21 -0400	[thread overview]
Message-ID: <9ec32edc-5aeb-53c0-7888-541f7a9db8bf@jeffhostetler.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1.1709020109520.4132@virtualbox>



On 9/1/2017 7:31 PM, Johannes Schindelin wrote:
> Hi Jeff,
> 
> On Wed, 30 Aug 2017, Jeff Hostetler wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> This is to address concerns raised by ThreadSanitizer on the mailing
>> list about threaded unprotected R/W access to map.size with my previous
>> "disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).
>> See:
>> https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/
>>
>> Add API to hashmap to disable item counting and to disable automatic
>> rehashing.  Also include APIs to re-enable item counting and automatica
>> rehashing.
>>
>> When item counting is disabled, the map.size field is invalid.  So to
>> prevent accidents, the field has been renamed and an accessor function
>> hashmap_get_size() has been added.  All direct references to this field
>> have been been updated.  And the name of the field changed to
>> map.private_size to communicate thie.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> ---
> 
> The Git contribution process forces me to point out lines longer than 80
> columns. I wish there was already an automated tool to fix that, but we
> (as in "the core Git developers") have not yet managed to agree on one. So
> I'll have to ask you to identify and fix them manually.

I'm not sure which lines you're talking about, but I'll
give it another scan and double check.

There's not much I can do about the public-inbox.org URL.

> 
>> @@ -253,6 +253,19 @@ static inline void hashmap_entry_init(void *entry, unsigned int hash)
>>   }
>>   
>>   /*
>> + * Return the number of items in the map.
>> + */
>> +inline unsigned int hashmap_get_size(struct hashmap *map)
>> +{
>> +	if (map->do_count_items)
>> +		return map->private_size;
>> +
>> +	/* TODO Consider counting them and returning that. */
> 
> I'd rather not. If counting is disabled, it is disabled.
> 
>> +	die("hashmap_get_size: size not set");
> 
> Before anybody can ask for this message to be wrapped in _(...) to be
> translateable, let me suggest instead to add the prefix "BUG: ".

Good point.  Thanks.

> 
>> +static inline void hashmap_enable_item_counting(struct hashmap *map)
>> +{
>> +	void *item;
>> +	unsigned int n = 0;
>> +	struct hashmap_iter iter;
>> +
>> +	hashmap_iter_init(map, &iter);
>> +	while ((item = hashmap_iter_next(&iter)))
>> +		n++;
>> +
>> +	map->do_count_items = 1;
>> +	map->private_size = n;
>> +}
> 
> BTW this made me think that we may have a problem in our code since
> switching from my original hashmap implementation to the bucket one added
> in 6a364ced497 (add a hashtable implementation that supports O(1) removal,
> 2013-11-14): while it is not expected that there are many collisions, the
> "grow_at" logic still essentially assumes the number of buckets to be
> equal to the number of hashmap entries.
> 
> Your code simply reiterates that assumption, so I do not blame you for
> anything here, nor ask you to change your patch.

I'm not sure what you're saying here.  The iterator iterates over
all entries (and handles walking collision chains), so my newly
computed count should be correct and all of this is independent of
the "grow-at" and table-size logic.

I'm not forcing a rehash when counting is enabled.  I'm just
reestablishing the expected state.  The next insert may cause
a rehash, but I'm not forcing it.

However, there is an assumption that the caller pre-allocated sufficient
table-size space to avoid poor performance for the duration of the
non-counting period.

> 
> But it does look a bit weird to assume so much about the nature of our
> data, without having any real-life numbers. I wish I had more time so that
> I could afford to run a couple of tests on this hashmap, such as: what is
> the typical difference between bucket count and entry count, or the median
> of the bucket sizes when the map is 80% full (i.e. *just* below the grow
> threshold).

Personally, I think the 80% threshold is too aggressive (and the
default size is too small), but that's a different question.

The hashmap in question contains directory pathnames, so the
distribution will be completely dependent on the shape of the
data.

FWIW, I created a tool to dump some of this data.  See:
     t/helper/test-lazy-init-name-hash.c

> 
>> diff --git a/name-hash.c b/name-hash.c
>> index 0e10f3e..829ff59 100644
>> --- a/name-hash.c
>> +++ b/name-hash.c
>> @@ -580,9 +580,11 @@ static void lazy_init_name_hash(struct index_state *istate)
>>   			NULL, istate->cache_nr);
>>   
>>   	if (lookup_lazy_params(istate)) {
>> -		hashmap_disallow_rehash(&istate->dir_hash, 1);
>> +		hashmap_disable_item_counting(&istate->dir_hash);
>> +		hashmap_disable_auto_rehash(&istate->dir_hash);
>>   		threaded_lazy_init_name_hash(istate);
>> -		hashmap_disallow_rehash(&istate->dir_hash, 0);
>> +		hashmap_enable_auto_rehash(&istate->dir_hash);
>> +		hashmap_enable_item_counting(&istate->dir_hash);
> 
> By your rationale, it would be enough to simply disable and re-enable
> counting...
> 
> The rest of the patch looks just dandy to me.
> 
> Thanks,
> Dscho
> 

thanks
Jeff


  parent reply	other threads:[~2017-09-05 16:33 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15 12:53 [PATCH/RFC 0/5] Some ThreadSanitizer-results Martin Ågren
2017-08-15 12:53 ` [PATCH 1/5] convert: initialize attr_action in convert_attrs Martin Ågren
2017-08-15 14:17   ` Torsten Bögershausen
2017-08-15 14:29     ` Torsten Bögershausen
2017-08-15 14:40     ` Martin Ågren
2017-08-15 12:53 ` [PATCH 2/5] pack-objects: take lock before accessing `remaining` Martin Ågren
2017-08-15 19:50   ` Johannes Sixt
2017-08-15 12:53 ` [PATCH 3/5] Makefile: define GIT_THREAD_SANITIZER Martin Ågren
2017-08-15 12:53 ` [PATCH 4/5] strbuf_reset: don't write to slopbuf with ThreadSanitizer Martin Ågren
2017-08-15 18:43   ` Junio C Hamano
2017-08-15 19:06     ` Martin Ågren
2017-08-15 19:19       ` Junio C Hamano
2017-08-15 12:53 ` [PATCH 5/5] ThreadSanitizer: add suppressions Martin Ågren
2017-08-15 12:53 ` tsan: t3008: hashmap_add touches size from multiple threads Martin Ågren
2017-08-15 17:59   ` Jeff Hostetler
2017-08-15 18:17     ` Stefan Beller
2017-08-15 18:40       ` Martin Ågren
2017-08-15 18:48         ` Stefan Beller
2017-08-15 19:21           ` Martin Ågren
2017-08-15 20:46             ` Jeff Hostetler
2017-08-30 18:59   ` [PATCH] hashmap: address ThreadSanitizer concerns Jeff Hostetler
2017-08-30 18:59     ` [PATCH] hashmap: add API to disable item counting when threaded Jeff Hostetler
2017-09-01 23:31       ` Johannes Schindelin
2017-09-01 23:50         ` Jonathan Nieder
2017-09-05 16:39           ` Jeff Hostetler
2017-09-05 17:13             ` Martin Ågren
2017-09-02  8:17         ` Jeff King
2017-09-04 15:59           ` Johannes Schindelin
2017-09-05 16:54           ` Jeff Hostetler
2017-09-06  3:43           ` Junio C Hamano
2017-09-05 16:33         ` Jeff Hostetler [this message]
2017-09-02  8:05       ` Jeff King
2017-09-05 17:07         ` Jeff Hostetler
2017-09-02  8:39       ` Simon Ruderich
2017-09-06  1:24       ` Junio C Hamano
2017-09-06 15:33         ` Jeff Hostetler
2017-09-06 15:43     ` [PATCH v2] hashmap: address ThreadSanitizer concerns Jeff Hostetler
2017-09-06 15:43       ` [PATCH v2] hashmap: add API to disable item counting when threaded Jeff Hostetler
2017-08-15 12:53 ` tsan: t5400: set_try_to_free_routine Martin Ågren
2017-08-15 17:35   ` Stefan Beller
2017-08-15 18:44     ` Martin Ågren
2017-08-17 10:57   ` Jeff King
2017-08-20 10:06 ` [PATCH/RFC 0/5] Some ThreadSanitizer-results Jeff King
2017-08-20 10:45   ` Martin Ågren
2017-08-21 17:43 ` [PATCH v2 0/4] " Martin Ågren
2017-08-21 17:43   ` [PATCH v2 1/4] convert: always initialize attr_action in convert_attrs Martin Ågren
2017-08-21 17:43   ` [PATCH v2 2/4] pack-objects: take lock before accessing `remaining` Martin Ågren
2017-08-21 17:43   ` [PATCH v2 3/4] strbuf_setlen: don't write to strbuf_slopbuf Martin Ågren
2017-08-23 17:24     ` Junio C Hamano
2017-08-23 17:43       ` Martin Ågren
2017-08-23 18:30         ` Junio C Hamano
2017-08-23 20:37     ` Brandon Casey
2017-08-23 21:04       ` Junio C Hamano
2017-08-23 21:20         ` Brandon Casey
2017-08-23 21:54           ` Brandon Casey
2017-08-23 22:11             ` Brandon Casey
2017-08-24 16:52             ` Junio C Hamano
2017-08-24 18:29               ` Brandon Casey
2017-08-24 19:16                 ` Martin Ågren
2017-08-23 22:24           ` Junio C Hamano
2017-08-23 22:39             ` Brandon Casey
2017-08-21 17:43   ` [PATCH v2 4/4] ThreadSanitizer: add suppressions Martin Ågren
2017-08-25 17:04     ` Jeff King
2017-08-28 20:56   ` [PATCH v2 0/4] Some ThreadSanitizer-results Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ec32edc-5aeb-53c0-7888-541f7a9db8bf@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=martin.agren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).