git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly
@ 2017-12-22 10:59 Johannes Schindelin
  2017-12-22 17:16 ` Brandon Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2017-12-22 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

In Git's source code, we have this convention that quite a few data
structures can be initialized using a macro *_INIT while defining an
instance (instead of having to call a function to initialize the data
structure). You will see that idiom quite a bit, e.g. `struct strbuf buf
= STRBUF_INIT;`

This works for oidsets, too: `struct oidset oids = OIDSET_INIT;` is
perfectly legal and you can use that data structure right away, without
having to call `oidset_init()` first.

That pattern is violated by OIDMAP_INIT, though. The first call to
oidmap_put() or oidmap_get() will succeed, but by mistake rather than by
design: The underlying hashmap is not initialized correctly, as the
cmpfn function pointer still points to NULL, but since there are no
entries to be compared, cmpfn will not be called. Things break down,
though, as soon as there is even one entry.

Rather than causing confusion, frustration and needless loss of time due
to pointless debugging, let's *rename* OIDMAP_INIT so that developers
who have gotten used to the pattern `struct xyz a = XYZ_INIT;` won't do
that with oidmaps.

An alternative would be to introduce a HASHMAP_INIT(cmp_fn) and use that
in oidmap.h. However, there are *no* call sites in Git's source code
that would benefit from that macro, i.e. this would be premature
optimization (and cost a lot more time than this here trivial patch).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Published-As: https://github.com/dscho/git/releases/tag/discourage-OIDMAP_INIT-v1
Fetch-It-Via: git fetch https://github.com/dscho/git discourage-OIDMAP_INIT-v1
 oidmap.h | 6 +++++-
 oidset.h | 7 ++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/oidmap.h b/oidmap.h
index 18f54cde143..1a73c392b79 100644
--- a/oidmap.h
+++ b/oidmap.h
@@ -21,7 +21,11 @@ struct oidmap {
 	struct hashmap map;
 };
 
-#define OIDMAP_INIT { { NULL } }
+/*
+ * This macro initializes the data structure only incompletely, just enough
+ * to call oidmap_get() on an empty map. Use oidmap_init() instead.
+ */
+#define OIDMAP_INIT_INCOMPLETELY { { NULL } }
 
 /*
  * Initializes an oidmap structure.
diff --git a/oidset.h b/oidset.h
index f4c9e0f9c04..a11d88edc1d 100644
--- a/oidset.h
+++ b/oidset.h
@@ -22,7 +22,12 @@ struct oidset {
 	struct oidmap map;
 };
 
-#define OIDSET_INIT { OIDMAP_INIT }
+/*
+ * It is okay to initialize the map incompletely here because oidset_insert()
+ * will call oidset_init() (which will call oidmap_init()), and
+ * oidset_contains() works as intended even before oidset_init() was called.
+ */
+#define OIDSET_INIT { OIDMAP_INIT_INCOMPLETELY }
 
 /**
  * Returns true iff `set` contains `oid`.

base-commit: 936d1b989416a95f593bf81ccae8ac62cd83f279
-- 
2.15.1.windows.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly
  2017-12-22 10:59 [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly Johannes Schindelin
@ 2017-12-22 17:16 ` Brandon Williams
  2017-12-22 20:12   ` Junio C Hamano
  2018-01-02 18:13   ` Jeff Hostetler
  0 siblings, 2 replies; 8+ messages in thread
From: Brandon Williams @ 2017-12-22 17:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano

On 12/22, Johannes Schindelin wrote:
> In Git's source code, we have this convention that quite a few data
> structures can be initialized using a macro *_INIT while defining an
> instance (instead of having to call a function to initialize the data
> structure). You will see that idiom quite a bit, e.g. `struct strbuf buf
> = STRBUF_INIT;`
> 
> This works for oidsets, too: `struct oidset oids = OIDSET_INIT;` is
> perfectly legal and you can use that data structure right away, without
> having to call `oidset_init()` first.
> 
> That pattern is violated by OIDMAP_INIT, though. The first call to
> oidmap_put() or oidmap_get() will succeed, but by mistake rather than by
> design: The underlying hashmap is not initialized correctly, as the
> cmpfn function pointer still points to NULL, but since there are no
> entries to be compared, cmpfn will not be called. Things break down,
> though, as soon as there is even one entry.
> 
> Rather than causing confusion, frustration and needless loss of time due
> to pointless debugging, let's *rename* OIDMAP_INIT so that developers
> who have gotten used to the pattern `struct xyz a = XYZ_INIT;` won't do
> that with oidmaps.
> 
> An alternative would be to introduce a HASHMAP_INIT(cmp_fn) and use that
> in oidmap.h. However, there are *no* call sites in Git's source code
> that would benefit from that macro, i.e. this would be premature
> optimization (and cost a lot more time than this here trivial patch).
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> Published-As: https://github.com/dscho/git/releases/tag/discourage-OIDMAP_INIT-v1
> Fetch-It-Via: git fetch https://github.com/dscho/git discourage-OIDMAP_INIT-v1
>  oidmap.h | 6 +++++-
>  oidset.h | 7 ++++++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/oidmap.h b/oidmap.h
> index 18f54cde143..1a73c392b79 100644
> --- a/oidmap.h
> +++ b/oidmap.h
> @@ -21,7 +21,11 @@ struct oidmap {
>  	struct hashmap map;
>  };
>  
> -#define OIDMAP_INIT { { NULL } }
> +/*
> + * This macro initializes the data structure only incompletely, just enough
> + * to call oidmap_get() on an empty map. Use oidmap_init() instead.
> + */
> +#define OIDMAP_INIT_INCOMPLETELY { { NULL } }

This is one way of approaching the problem.  Couldn't we also take the
approach like we have with oidset and ensure that when oidmap_get() or
_put() are called that if the oidmap isn't initialized, we simply do
that then?

>  
>  /*
>   * Initializes an oidmap structure.
> diff --git a/oidset.h b/oidset.h
> index f4c9e0f9c04..a11d88edc1d 100644
> --- a/oidset.h
> +++ b/oidset.h
> @@ -22,7 +22,12 @@ struct oidset {
>  	struct oidmap map;
>  };
>  
> -#define OIDSET_INIT { OIDMAP_INIT }
> +/*
> + * It is okay to initialize the map incompletely here because oidset_insert()
> + * will call oidset_init() (which will call oidmap_init()), and
> + * oidset_contains() works as intended even before oidset_init() was called.
> + */
> +#define OIDSET_INIT { OIDMAP_INIT_INCOMPLETELY }
>  
>  /**
>   * Returns true iff `set` contains `oid`.
> 
> base-commit: 936d1b989416a95f593bf81ccae8ac62cd83f279
> -- 
> 2.15.1.windows.2

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly
  2017-12-22 17:16 ` Brandon Williams
@ 2017-12-22 20:12   ` Junio C Hamano
  2017-12-22 23:27     ` [PATCH] oidmap: ensure map is initialized Brandon Williams
  2017-12-23  0:15     ` [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly Johannes Schindelin
  2018-01-02 18:13   ` Jeff Hostetler
  1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-12-22 20:12 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Johannes Schindelin, git

Brandon Williams <bmwill@google.com> writes:

>> -#define OIDMAP_INIT { { NULL } }
>> +/*
>> + * This macro initializes the data structure only incompletely, just enough
>> + * to call oidmap_get() on an empty map. Use oidmap_init() instead.
>> + */
>> +#define OIDMAP_INIT_INCOMPLETELY { { NULL } }
>
> This is one way of approaching the problem.  Couldn't we also take the
> approach like we have with oidset and ensure that when oidmap_get() or
> _put() are called that if the oidmap isn't initialized, we simply do
> that then?

Hmph.  Can you show how the alternative code would look like?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] oidmap: ensure map is initialized
  2017-12-22 20:12   ` Junio C Hamano
@ 2017-12-22 23:27     ` Brandon Williams
  2017-12-23  0:23       ` Johannes Schindelin
  2017-12-27 18:01       ` Junio C Hamano
  2017-12-23  0:15     ` [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly Johannes Schindelin
  1 sibling, 2 replies; 8+ messages in thread
From: Brandon Williams @ 2017-12-22 23:27 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, gitster, Brandon Williams

Ensure that an oidmap is initialized before attempting to add, remove,
or retrieve an entry by simply performing the initialization step
before accessing the underlying hashmap.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 oidmap.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/oidmap.c b/oidmap.c
index 6db4fffcd..d9fb19ba6 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -33,12 +33,19 @@ void oidmap_free(struct oidmap *map, int free_entries)
 
 void *oidmap_get(const struct oidmap *map, const struct object_id *key)
 {
+	if (!map->map.cmpfn)
+		return NULL;
+
 	return hashmap_get_from_hash(&map->map, hash(key), key);
 }
 
 void *oidmap_remove(struct oidmap *map, const struct object_id *key)
 {
 	struct hashmap_entry entry;
+
+	if (!map->map.cmpfn)
+		oidmap_init(map, 0);
+
 	hashmap_entry_init(&entry, hash(key));
 	return hashmap_remove(&map->map, &entry, key);
 }
@@ -46,6 +53,10 @@ void *oidmap_remove(struct oidmap *map, const struct object_id *key)
 void *oidmap_put(struct oidmap *map, void *entry)
 {
 	struct oidmap_entry *to_put = entry;
+
+	if (!map->map.cmpfn)
+		oidmap_init(map, 0);
+
 	hashmap_entry_init(&to_put->internal_entry, hash(&to_put->oid));
 	return hashmap_put(&map->map, to_put);
 }
-- 
2.15.1.620.gb9897f4670-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly
  2017-12-22 20:12   ` Junio C Hamano
  2017-12-22 23:27     ` [PATCH] oidmap: ensure map is initialized Brandon Williams
@ 2017-12-23  0:15     ` Johannes Schindelin
  1 sibling, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2017-12-23  0:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git

Hi Junio,

On Fri, 22 Dec 2017, Junio C Hamano wrote:

> Brandon Williams <bmwill@google.com> writes:
> 
> >> -#define OIDMAP_INIT { { NULL } }
> >> +/*
> >> + * This macro initializes the data structure only incompletely, just enough
> >> + * to call oidmap_get() on an empty map. Use oidmap_init() instead.
> >> + */
> >> +#define OIDMAP_INIT_INCOMPLETELY { { NULL } }
> >
> > This is one way of approaching the problem.  Couldn't we also take the
> > approach like we have with oidset and ensure that when oidmap_get() or
> > _put() are called that if the oidmap isn't initialized, we simply do
> > that then?
> 
> Hmph.  Can you show how the alternative code would look like?

No, because I refuse to perform pointless work, in particular when I am
already pretty booked with work.

But you know how it would look like, right? The cmpfn() function would be
exported via oidmap.h, and a HASHMAP_INIT(cmpfn) would be introduced in
hashmap.h that would initialize everything zeroed out except for the
cmpfn.

But then you would review it and ask if there would be any use in adding
cmp_cb_data to the signature of the HASHMAP_INIT() macro, and I would have
to implement that, too.

And then nobody would use it, and the macro would very likely get
stale/incorrect. And then I would offer another patch reverting that
change (because there is no user) and replace it with this here patch.

As I said: pointless,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] oidmap: ensure map is initialized
  2017-12-22 23:27     ` [PATCH] oidmap: ensure map is initialized Brandon Williams
@ 2017-12-23  0:23       ` Johannes Schindelin
  2017-12-27 18:01       ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2017-12-23  0:23 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, gitster

Hi Brandon,

On Fri, 22 Dec 2017, Brandon Williams wrote:

> Ensure that an oidmap is initialized before attempting to add, remove,
> or retrieve an entry by simply performing the initialization step
> before accessing the underlying hashmap.
> 
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  oidmap.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/oidmap.c b/oidmap.c
> index 6db4fffcd..d9fb19ba6 100644
> --- a/oidmap.c
> +++ b/oidmap.c
> @@ -33,12 +33,19 @@ void oidmap_free(struct oidmap *map, int free_entries)
>  
>  void *oidmap_get(const struct oidmap *map, const struct object_id *key)
>  {
> +	if (!map->map.cmpfn)
> +		return NULL;
> +

This one is unnecessary: if we always _init() before we add, then
hashmap_get_from_hash() will not have anything to compare, and therefore
not call cmpfn.

You could argue that it is only a teeny-tiny runtime cost, so it'd be
better to be safe than to be sorry.

However, it is a death by a thousand cuts. My colleagues and I try to
shave off a few percent here and a few percent there, in the hope to get
some performance improvements worth writing home about. And then patches
like this one come in that simply add runtime without much in the way of
performance considerations.

And that makes me quite a bit sad.

>  	return hashmap_get_from_hash(&map->map, hash(key), key);
>  }
>  
>  void *oidmap_remove(struct oidmap *map, const struct object_id *key)
>  {
>  	struct hashmap_entry entry;
> +
> +	if (!map->map.cmpfn)
> +		oidmap_init(map, 0);
> +
>  	hashmap_entry_init(&entry, hash(key));
>  	return hashmap_remove(&map->map, &entry, key);
>  }
> @@ -46,6 +53,10 @@ void *oidmap_remove(struct oidmap *map, const struct object_id *key)
>  void *oidmap_put(struct oidmap *map, void *entry)
>  {
>  	struct oidmap_entry *to_put = entry;
> +
> +	if (!map->map.cmpfn)
> +		oidmap_init(map, 0);
> +
>  	hashmap_entry_init(&to_put->internal_entry, hash(&to_put->oid));
>  	return hashmap_put(&map->map, to_put);
>  }

"But it does not add a lot, it's only a couple of microseconds"

Sure. But we could (and do) simply initialize the hashmaps once, and avoid
having to spend unnecessary cycles for every single access.

I *much* prefer my original patch that essentially does not change *any*
code path. Everything stays the same, except that there is now a strong
hint explaining why you need to call oidmap_init() manually instead of
using the OIDMAP_INIT macro and then wonder why your code crashes.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] oidmap: ensure map is initialized
  2017-12-22 23:27     ` [PATCH] oidmap: ensure map is initialized Brandon Williams
  2017-12-23  0:23       ` Johannes Schindelin
@ 2017-12-27 18:01       ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-12-27 18:01 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, Johannes.Schindelin

Brandon Williams <bmwill@google.com> writes:

> Ensure that an oidmap is initialized before attempting to add, remove,
> or retrieve an entry by simply performing the initialization step
> before accessing the underlying hashmap.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---

Looks sane.  Thanks for illustrating the idea with actual code.

Essentially, you are using map->map.cmpfn as a boolean to see "have
we initialized this thing?  if not, we need to initialize it on
demand".  

By the way, I am somewhat more sympathetic than usual to Dscho's
"make oidmap_get() very aware of the internal implementation detail
of hashmap_get_from_hash() to micro-optimize by removing the check
from _get()".  Such a layering violation is disgusting, and making
it deliberately shows an even worse design taste, but in this
particular case, because the oidmap API is too thin a layer on top
of hashmap, it is understandably a very tempting approach.

>  oidmap.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/oidmap.c b/oidmap.c
> index 6db4fffcd..d9fb19ba6 100644
> --- a/oidmap.c
> +++ b/oidmap.c
> @@ -33,12 +33,19 @@ void oidmap_free(struct oidmap *map, int free_entries)
>  
>  void *oidmap_get(const struct oidmap *map, const struct object_id *key)
>  {
> +	if (!map->map.cmpfn)
> +		return NULL;
> +
>  	return hashmap_get_from_hash(&map->map, hash(key), key);
>  }
>  
>  void *oidmap_remove(struct oidmap *map, const struct object_id *key)
>  {
>  	struct hashmap_entry entry;
> +
> +	if (!map->map.cmpfn)
> +		oidmap_init(map, 0);
> +
>  	hashmap_entry_init(&entry, hash(key));
>  	return hashmap_remove(&map->map, &entry, key);
>  }
> @@ -46,6 +53,10 @@ void *oidmap_remove(struct oidmap *map, const struct object_id *key)
>  void *oidmap_put(struct oidmap *map, void *entry)
>  {
>  	struct oidmap_entry *to_put = entry;
> +
> +	if (!map->map.cmpfn)
> +		oidmap_init(map, 0);
> +
>  	hashmap_entry_init(&to_put->internal_entry, hash(&to_put->oid));
>  	return hashmap_put(&map->map, to_put);
>  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly
  2017-12-22 17:16 ` Brandon Williams
  2017-12-22 20:12   ` Junio C Hamano
@ 2018-01-02 18:13   ` Jeff Hostetler
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff Hostetler @ 2018-01-02 18:13 UTC (permalink / raw)
  To: Brandon Williams, Johannes Schindelin; +Cc: git, Junio C Hamano



On 12/22/2017 12:16 PM, Brandon Williams wrote:
> On 12/22, Johannes Schindelin wrote:
[...]
>> That pattern is violated by OIDMAP_INIT, though. The first call to
>> oidmap_put() or oidmap_get() will succeed, but by mistake rather than by
>> design: The underlying hashmap is not initialized correctly, as the
>> cmpfn function pointer still points to NULL, but since there are no
>> entries to be compared, cmpfn will not be called. Things break down,
>> though, as soon as there is even one entry.
>>
>> Rather than causing confusion, frustration and needless loss of time due
>> to pointless debugging, let's *rename* OIDMAP_INIT so that developers
>> who have gotten used to the pattern `struct xyz a = XYZ_INIT;` won't do
>> that with oidmaps.
>>   
>> -#define OIDMAP_INIT { { NULL } }
>> +/*
>> + * This macro initializes the data structure only incompletely, just enough
>> + * to call oidmap_get() on an empty map. Use oidmap_init() instead.
>> + */
>> +#define OIDMAP_INIT_INCOMPLETELY { { NULL } }

Alternatively, could we define the macro to an expression
that would cause a compiler error?  Then any new code written
would fail to compile.  And we document the issue in a comment
above the macro so no one changes the macro to "fix" it.

(I suggest this as opposed to simply removing the macro
to prevent someone from re-adding it later, since it is the
standard pattern.)

Just a thought,
Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-02 18:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-22 10:59 [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly Johannes Schindelin
2017-12-22 17:16 ` Brandon Williams
2017-12-22 20:12   ` Junio C Hamano
2017-12-22 23:27     ` [PATCH] oidmap: ensure map is initialized Brandon Williams
2017-12-23  0:23       ` Johannes Schindelin
2017-12-27 18:01       ` Junio C Hamano
2017-12-23  0:15     ` [PATCH] oidmap.h: strongly discourage using OIDMAP_INIT directly Johannes Schindelin
2018-01-02 18:13   ` Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).