* [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() @ 2020-09-06 8:59 René Scharfe 2020-09-06 19:02 ` Taylor Blau 0 siblings, 1 reply; 4+ messages in thread From: René Scharfe @ 2020-09-06 8:59 UTC (permalink / raw) To: Git Mailing List; +Cc: Junio C Hamano, Taylor Blau Call hashwrite_be32() instead of open-coding it. This is shorter and easier to read. Signed-off-by: René Scharfe <l.s.r@web.de> --- pack-bitmap-write.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index a7a4964b50..5e998bdaa7 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -503,8 +503,7 @@ static void write_hash_cache(struct hashfile *f, for (i = 0; i < index_nr; ++i) { struct object_entry *entry = (struct object_entry *)index[i]; - uint32_t hash_value = htonl(entry->hash); - hashwrite(f, &hash_value, sizeof(hash_value)); + hashwrite_be32(f, entry->hash); } } -- 2.28.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() 2020-09-06 8:59 [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() René Scharfe @ 2020-09-06 19:02 ` Taylor Blau 2020-09-07 2:23 ` Jeff King 0 siblings, 1 reply; 4+ messages in thread From: Taylor Blau @ 2020-09-06 19:02 UTC (permalink / raw) To: René Scharfe Cc: Git Mailing List, Junio C Hamano, Taylor Blau, Jeff King Hi René, On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote: > - uint32_t hash_value = htonl(entry->hash); > - hashwrite(f, &hash_value, sizeof(hash_value)); > + hashwrite_be32(f, entry->hash); This is an obviously correct translation of what's already written, and indeed it is shorter and easier to read. Unfortunately, I think there is some more subtlety here since the hash cache isn't guarenteed to be aligned, and so blindly calling htonl() (either directly in write_hash_cache(), or indirectly in hashwrite_be32()) might cause tools like ASan to complain when loading data on architectures that don't support fast unaligned reads. So, I think that we could do one of three things, depending on how much you care about improving this case ;-). - leave your patch alone, accepting that this case which was broken before will remain broken, and leave it as #leftoverbits - discard your patch as-is, and replace the 'htonl' with 'get_be32()' before handing it off to 'hashwrite()', or - change the 'hashwrite_beXX()' implementations to use the correct 'get_beXX' wrappers which behave like htonl() on architectures with fast unaligned loads, and fall back to byte reads and shifts on architectures that don't. Credit goes to Peff for finding this issue in GitHub's fork. For what it's worth, we were planning on sending those patches to the list soon, but they are tied up with a longer series in the meantime. For what it's worth, I think doing any of the above would be fine. Thanks, Taylor ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() 2020-09-06 19:02 ` Taylor Blau @ 2020-09-07 2:23 ` Jeff King 2020-09-07 2:30 ` Taylor Blau 0 siblings, 1 reply; 4+ messages in thread From: Jeff King @ 2020-09-07 2:23 UTC (permalink / raw) To: Taylor Blau; +Cc: René Scharfe, Git Mailing List, Junio C Hamano On Sun, Sep 06, 2020 at 03:02:35PM -0400, Taylor Blau wrote: > On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote: > > - uint32_t hash_value = htonl(entry->hash); > > - hashwrite(f, &hash_value, sizeof(hash_value)); > > + hashwrite_be32(f, entry->hash); > > This is an obviously correct translation of what's already written, and > indeed it is shorter and easier to read. > > Unfortunately, I think there is some more subtlety here since the hash > cache isn't guarenteed to be aligned, and so blindly calling htonl() > (either directly in write_hash_cache(), or indirectly in > hashwrite_be32()) might cause tools like ASan to complain when loading > data on architectures that don't support fast unaligned reads. I think the alignment here is fine. We're just writing out an individual value. So in the original entry->hash and our local hash_value are both properly aligned, since they're declared as uint32_t. We pass the pointer to hashwrite(), but it doesn't expect any particular alignment. After the patch, the situation is the same, except that we're working with the uint32_t parameter to hashwrite_be32(), which is also properly aligned. > So, I think that we could do one of three things, depending on how much > you care about improving this case ;-). > > - leave your patch alone, accepting that this case which was broken > before will remain broken, and leave it as #leftoverbits So I think this is what we should do. :) > - discard your patch as-is, and replace the 'htonl' with 'get_be32()' > before handing it off to 'hashwrite()', or No need here; get_be32() is for when we're reading in from an unaligned mmap. > - change the 'hashwrite_beXX()' implementations to use the correct > 'get_beXX' wrappers which behave like htonl() on architectures with > fast unaligned loads, and fall back to byte reads and shifts on > architectures that don't. Likewise, I don't think there's any reason to do this. hashwrite_be32() gets its parameter as a value, not a pointer. So even if it were coming from an unaligned mmap, it's actually the _caller_ who would have to use get_be32() when passing it. > Credit goes to Peff for finding this issue in GitHub's fork. For what > it's worth, we were planning on sending those patches to the list soon, > but they are tied up with a longer series in the meantime. There is a bug in our fork, but I don't think it's upstream. The relevant spot for the name-hash cache is in show_objects_for_type(), which reads from the bitmap->hashes pointer (that points into our unaligned mmap). But it does: if (bitmap_git->hashes) hash = get_be32(bitmap_git->hashes + entry->nr); which is correct (using htonl() would not be). The bug is that in our fork, we have a custom bit-cache extension[1] which does use htonl(), and should be get_be32(). That's something we'll need to clean up when we send those patches upstream. -Peff [1] For the curious, the point is it keep a cache of the bit position of each object, which lets us ask "is this object's bit set" without having to load the revindex. It's helpful for bitmap-optimizing some algorithms like "branch --contains", though I think we should re-evaluate how much it helps now that we have commit-graphs with generation numbers. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() 2020-09-07 2:23 ` Jeff King @ 2020-09-07 2:30 ` Taylor Blau 0 siblings, 0 replies; 4+ messages in thread From: Taylor Blau @ 2020-09-07 2:30 UTC (permalink / raw) To: Jeff King Cc: Taylor Blau, René Scharfe, Git Mailing List, Junio C Hamano On Sun, Sep 06, 2020 at 10:23:40PM -0400, Jeff King wrote: > On Sun, Sep 06, 2020 at 03:02:35PM -0400, Taylor Blau wrote: > > > On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote: > > > - uint32_t hash_value = htonl(entry->hash); > > > - hashwrite(f, &hash_value, sizeof(hash_value)); > > > + hashwrite_be32(f, entry->hash); > > > > This is an obviously correct translation of what's already written, and > > indeed it is shorter and easier to read. > > > > Unfortunately, I think there is some more subtlety here since the hash > > cache isn't guarenteed to be aligned, and so blindly calling htonl() > > (either directly in write_hash_cache(), or indirectly in > > hashwrite_be32()) might cause tools like ASan to complain when loading > > data on architectures that don't support fast unaligned reads. > > I think the alignment here is fine. We're just writing out an individual > value. So in the original entry->hash and our local hash_value are both > properly aligned, since they're declared as uint32_t. We pass the > pointer to hashwrite(), but it doesn't expect any particular alignment. > After the patch, the situation is the same, except that we're working > with the uint32_t parameter to hashwrite_be32(), which is also properly > aligned. Ack; I would blame it on skimming the patch, but this is far too obvious for that. The bug is on the *reading* end in GitHub's fork, and in a (custom) extension (which it looks like you describe below). Embarrassing. > > So, I think that we could do one of three things, depending on how much > > you care about improving this case ;-). > > > > - leave your patch alone, accepting that this case which was broken > > before will remain broken, and leave it as #leftoverbits > > So I think this is what we should do. :) Yep, this patch is correct as-is. > > - change the 'hashwrite_beXX()' implementations to use the correct > > 'get_beXX' wrappers which behave like htonl() on architectures with > > fast unaligned loads, and fall back to byte reads and shifts on > > architectures that don't. > > Likewise, I don't think there's any reason to do this. hashwrite_be32() > gets its parameter as a value, not a pointer. So even if it were coming > from an unaligned mmap, it's actually the _caller_ who would have to > use get_be32() when passing it. Right. > > Credit goes to Peff for finding this issue in GitHub's fork. For what > > it's worth, we were planning on sending those patches to the list soon, > > but they are tied up with a longer series in the meantime. > > There is a bug in our fork, but I don't think it's upstream. [...] Agreed with all of that, too. Taylor ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-09-07 2:31 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-06 8:59 [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() René Scharfe 2020-09-06 19:02 ` Taylor Blau 2020-09-07 2:23 ` Jeff King 2020-09-07 2:30 ` Taylor Blau
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).