From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 0F40D1F506 for ; Thu, 22 Sep 2022 22:27:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229804AbiIVW1X (ORCPT ); Thu, 22 Sep 2022 18:27:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbiIVW1V (ORCPT ); Thu, 22 Sep 2022 18:27:21 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E34D4DF68A for ; Thu, 22 Sep 2022 15:27:20 -0700 (PDT) Received: (qmail 9389 invoked by uid 109); 22 Sep 2022 22:27:20 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 22 Sep 2022 22:27:20 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 8569 invoked by uid 111); 22 Sep 2022 22:27:20 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 22 Sep 2022 18:27:20 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 22 Sep 2022 18:27:19 -0400 From: Jeff King To: Victoria Dye Cc: Derrick Stolee , git@vger.kernel.org Subject: Re: t9210-scalar.sh fails with SANITIZE=undefined Message-ID: References: <50c57a60-8346-6952-93d9-432a70ef74c5@github.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <50c57a60-8346-6952-93d9-432a70ef74c5@github.com> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Sep 22, 2022 at 03:09:52PM -0700, Victoria Dye wrote: > Other than allowing us to use a (non-packed) 'struct ondisk_cache_entry' to > parse the index entries, is there any reason why the on-disk index entries > should (or need to be) 4-byte aligned? If not, we could update how we read > the 'ondisk' index entry in 'create_from_disk()' to avoid the misalignment. I don't think so. And indeed, we already use get_be16(), etc, for most of the access (which is mostly there for endian-fixing, but also resolves alignment problems). > ------------------8<------------------8<------------------8<------------------ > diff --git a/read-cache.c b/read-cache.c > index b09128b188..f132a3f256 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate, > > static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, > unsigned int version, > - struct ondisk_cache_entry *ondisk, > + const char *ondisk, > unsigned long *ent_size, > const struct cache_entry *previous_ce) > { > @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, > size_t len; > const char *name; > const unsigned hashsz = the_hash_algo->rawsz; > - const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz); > + const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz; > unsigned int flags; > size_t copy_len = 0; > /* > ------------------>8------------------>8------------------>8------------------ > > the do the same sort of conversion with the rest of the function. It's > certainly uglier than just using the 'struct ondisk_index_entry *' pointer, > but it should avoid the misaligned addressing error. Yeah, I think that's probably the only reasonable solution. I thought ditching ondisk_cache_entry entirely (which is basically what this is doing) would be a tough sell, but a quick "grep" shows it really isn't used in all that many spots. I also wondered why other versions do not have a similar problem. After all, cache entries contain pathnames which are going to be of varying lengths. But this seems telling: $ git grep -m1 -B1 -A2 align_padding_size read-cache.c-/* These are only used for v3 or lower */ read-cache.c:#define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len) read-cache.c-#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,data) + (len) + 8) & ~7) read-cache.c-#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len) So we actually pad the entries in earlier versions to align them, but don't in v4. I'm not sure if that was a conscious choice to save space, or an unintended consequence (though it is mentioned in the docs, I think that came after the code). That's probably all obvious to people who work with the index a lot. It's the one part of Git I've mostly managed to remain oblivious to. :) -Peff