git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Victoria Dye <vdye@github.com>
Cc: Derrick Stolee <derrickstolee@github.com>, git@vger.kernel.org
Subject: Re: t9210-scalar.sh fails with SANITIZE=undefined
Date: Thu, 22 Sep 2022 18:27:19 -0400	[thread overview]
Message-ID: <YyzhR8CGu2CNQMfJ@coredump.intra.peff.net> (raw)
In-Reply-To: <50c57a60-8346-6952-93d9-432a70ef74c5@github.com>

On Thu, Sep 22, 2022 at 03:09:52PM -0700, Victoria Dye wrote:

> Other than allowing us to use a (non-packed) 'struct ondisk_cache_entry' to
> parse the index entries, is there any reason why the on-disk index entries
> should (or need to be) 4-byte aligned? If not, we could update how we read
> the 'ondisk' index entry in 'create_from_disk()' to avoid the misalignment.

I don't think so. And indeed, we already use get_be16(), etc, for most
of the access (which is mostly there for endian-fixing, but also
resolves alignment problems).

> ------------------8<------------------8<------------------8<------------------
> diff --git a/read-cache.c b/read-cache.c
> index b09128b188..f132a3f256 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate,
>  
>  static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>  					    unsigned int version,
> -					    struct ondisk_cache_entry *ondisk,
> +					    const char *ondisk,
>  					    unsigned long *ent_size,
>  					    const struct cache_entry *previous_ce)
>  {
> @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>  	size_t len;
>  	const char *name;
>  	const unsigned hashsz = the_hash_algo->rawsz;
> -	const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
> +	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
>  	unsigned int flags;
>  	size_t copy_len = 0;
>  	/*
> ------------------>8------------------>8------------------>8------------------
> 
> the do the same sort of conversion with the rest of the function. It's
> certainly uglier than just using the 'struct ondisk_index_entry *' pointer,
> but it should avoid the misaligned addressing error.

Yeah, I think that's probably the only reasonable solution. I thought
ditching ondisk_cache_entry entirely (which is basically what this is
doing) would be a tough sell, but a quick "grep" shows it really isn't
used in all that many spots.

I also wondered why other versions do not have a similar problem. After
all, cache entries contain pathnames which are going to be of varying
lengths. But this seems telling:

  $ git grep -m1 -B1 -A2 align_padding_size
  read-cache.c-/* These are only used for v3 or lower */
  read-cache.c:#define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len)
  read-cache.c-#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,data) + (len) + 8) & ~7)
  read-cache.c-#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)

So we actually pad the entries in earlier versions to align them, but
don't in v4. I'm not sure if that was a conscious choice to save space,
or an unintended consequence (though it is mentioned in the docs, I
think that came after the code).

That's probably all obvious to people who work with the index a lot.
It's the one part of Git I've mostly managed to remain oblivious to. :)

-Peff

  reply	other threads:[~2022-09-22 22:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 10:04 t9210-scalar.sh fails with SANITIZE=undefined Jeff King
2022-09-22 12:35 ` Derrick Stolee
2022-09-22 19:12   ` Jeff King
2022-09-22 22:09     ` Victoria Dye
2022-09-22 22:27       ` Jeff King [this message]
2022-09-22 22:56         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyzhR8CGu2CNQMfJ@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).