git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
@ 2020-09-06  8:59 René Scharfe
  2020-09-06 19:02 ` Taylor Blau
  0 siblings, 1 reply; 4+ messages in thread
From: René Scharfe @ 2020-09-06  8:59 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Taylor Blau

Call hashwrite_be32() instead of open-coding it.  This is shorter and
easier to read.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 pack-bitmap-write.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index a7a4964b50..5e998bdaa7 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -503,8 +503,7 @@ static void write_hash_cache(struct hashfile *f,

 	for (i = 0; i < index_nr; ++i) {
 		struct object_entry *entry = (struct object_entry *)index[i];
-		uint32_t hash_value = htonl(entry->hash);
-		hashwrite(f, &hash_value, sizeof(hash_value));
+		hashwrite_be32(f, entry->hash);
 	}
 }

--
2.28.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  2020-09-06  8:59 [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() René Scharfe
@ 2020-09-06 19:02 ` Taylor Blau
  2020-09-07  2:23   ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Taylor Blau @ 2020-09-06 19:02 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Junio C Hamano, Taylor Blau, Jeff King

Hi René,

On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote:
> -		uint32_t hash_value = htonl(entry->hash);
> -		hashwrite(f, &hash_value, sizeof(hash_value));
> +		hashwrite_be32(f, entry->hash);

This is an obviously correct translation of what's already written, and
indeed it is shorter and easier to read.

Unfortunately, I think there is some more subtlety here since the hash
cache isn't guarenteed to be aligned, and so blindly calling htonl()
(either directly in write_hash_cache(), or indirectly in
hashwrite_be32()) might cause tools like ASan to complain when loading
data on architectures that don't support fast unaligned reads.

So, I think that we could do one of three things, depending on how much
you care about improving this case ;-).

  - leave your patch alone, accepting that this case which was broken
    before will remain broken, and leave it as #leftoverbits

  - discard your patch as-is, and replace the 'htonl' with 'get_be32()'
    before handing it off to 'hashwrite()', or

  - change the 'hashwrite_beXX()' implementations to use the correct
    'get_beXX' wrappers which behave like htonl() on architectures with
    fast unaligned loads, and fall back to byte reads and shifts on
    architectures that don't.

Credit goes to Peff for finding this issue in GitHub's fork. For what
it's worth, we were planning on sending those patches to the list soon,
but they are tied up with a longer series in the meantime.

For what it's worth, I think doing any of the above would be fine.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  2020-09-06 19:02 ` Taylor Blau
@ 2020-09-07  2:23   ` Jeff King
  2020-09-07  2:30     ` Taylor Blau
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2020-09-07  2:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: René Scharfe, Git Mailing List, Junio C Hamano

On Sun, Sep 06, 2020 at 03:02:35PM -0400, Taylor Blau wrote:

> On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote:
> > -		uint32_t hash_value = htonl(entry->hash);
> > -		hashwrite(f, &hash_value, sizeof(hash_value));
> > +		hashwrite_be32(f, entry->hash);
> 
> This is an obviously correct translation of what's already written, and
> indeed it is shorter and easier to read.
> 
> Unfortunately, I think there is some more subtlety here since the hash
> cache isn't guarenteed to be aligned, and so blindly calling htonl()
> (either directly in write_hash_cache(), or indirectly in
> hashwrite_be32()) might cause tools like ASan to complain when loading
> data on architectures that don't support fast unaligned reads.

I think the alignment here is fine. We're just writing out an individual
value. So in the original entry->hash and our local hash_value are both
properly aligned, since they're declared as uint32_t. We pass the
pointer to hashwrite(), but it doesn't expect any particular alignment.
After the patch, the situation is the same, except that we're working
with the uint32_t parameter to hashwrite_be32(), which is also properly
aligned.

> So, I think that we could do one of three things, depending on how much
> you care about improving this case ;-).
> 
>   - leave your patch alone, accepting that this case which was broken
>     before will remain broken, and leave it as #leftoverbits

So I think this is what we should do. :)

>   - discard your patch as-is, and replace the 'htonl' with 'get_be32()'
>     before handing it off to 'hashwrite()', or

No need here; get_be32() is for when we're reading in from an unaligned
mmap.

>   - change the 'hashwrite_beXX()' implementations to use the correct
>     'get_beXX' wrappers which behave like htonl() on architectures with
>     fast unaligned loads, and fall back to byte reads and shifts on
>     architectures that don't.

Likewise, I don't think there's any reason to do this. hashwrite_be32()
gets its parameter as a value, not a pointer. So even if it were coming
from an unaligned mmap, it's actually the _caller_ who would have to
use get_be32() when passing it.

> Credit goes to Peff for finding this issue in GitHub's fork. For what
> it's worth, we were planning on sending those patches to the list soon,
> but they are tied up with a longer series in the meantime.

There is a bug in our fork, but I don't think it's upstream. The
relevant spot for the name-hash cache is in show_objects_for_type(),
which reads from the bitmap->hashes pointer (that points into our
unaligned mmap). But it does:

        if (bitmap_git->hashes)
                hash = get_be32(bitmap_git->hashes + entry->nr);

which is correct (using htonl() would not be). The bug is that in our
fork, we have a custom bit-cache extension[1] which does use htonl(),
and should be get_be32(). That's something we'll need to clean up when
we send those patches upstream.

-Peff

[1] For the curious, the point is it keep a cache of the bit position of
    each object, which lets us ask "is this object's bit set" without
    having to load the revindex. It's helpful for bitmap-optimizing some
    algorithms like "branch --contains", though I think we should
    re-evaluate how much it helps now that we have commit-graphs with
    generation numbers.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  2020-09-07  2:23   ` Jeff King
@ 2020-09-07  2:30     ` Taylor Blau
  0 siblings, 0 replies; 4+ messages in thread
From: Taylor Blau @ 2020-09-07  2:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, René Scharfe, Git Mailing List, Junio C Hamano

On Sun, Sep 06, 2020 at 10:23:40PM -0400, Jeff King wrote:
> On Sun, Sep 06, 2020 at 03:02:35PM -0400, Taylor Blau wrote:
>
> > On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote:
> > > -		uint32_t hash_value = htonl(entry->hash);
> > > -		hashwrite(f, &hash_value, sizeof(hash_value));
> > > +		hashwrite_be32(f, entry->hash);
> >
> > This is an obviously correct translation of what's already written, and
> > indeed it is shorter and easier to read.
> >
> > Unfortunately, I think there is some more subtlety here since the hash
> > cache isn't guarenteed to be aligned, and so blindly calling htonl()
> > (either directly in write_hash_cache(), or indirectly in
> > hashwrite_be32()) might cause tools like ASan to complain when loading
> > data on architectures that don't support fast unaligned reads.
>
> I think the alignment here is fine. We're just writing out an individual
> value. So in the original entry->hash and our local hash_value are both
> properly aligned, since they're declared as uint32_t. We pass the
> pointer to hashwrite(), but it doesn't expect any particular alignment.
> After the patch, the situation is the same, except that we're working
> with the uint32_t parameter to hashwrite_be32(), which is also properly
> aligned.

Ack; I would blame it on skimming the patch, but this is far too obvious
for that. The bug is on the *reading* end in GitHub's fork, and in a
(custom) extension (which it looks like you describe below).
Embarrassing.

> > So, I think that we could do one of three things, depending on how much
> > you care about improving this case ;-).
> >
> >   - leave your patch alone, accepting that this case which was broken
> >     before will remain broken, and leave it as #leftoverbits
>
> So I think this is what we should do. :)

Yep, this patch is correct as-is.

> >   - change the 'hashwrite_beXX()' implementations to use the correct
> >     'get_beXX' wrappers which behave like htonl() on architectures with
> >     fast unaligned loads, and fall back to byte reads and shifts on
> >     architectures that don't.
>
> Likewise, I don't think there's any reason to do this. hashwrite_be32()
> gets its parameter as a value, not a pointer. So even if it were coming
> from an unaligned mmap, it's actually the _caller_ who would have to
> use get_be32() when passing it.

Right.

> > Credit goes to Peff for finding this issue in GitHub's fork. For what
> > it's worth, we were planning on sending those patches to the list soon,
> > but they are tied up with a longer series in the meantime.
>
> There is a bug in our fork, but I don't think it's upstream. [...]

Agreed with all of that, too.


Taylor

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-07  2:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-06  8:59 [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() René Scharfe
2020-09-06 19:02 ` Taylor Blau
2020-09-07  2:23   ` Jeff King
2020-09-07  2:30     ` Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).