git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Victoria Dye <vdye@github.com>
To: Jeff King <peff@peff.net>, Derrick Stolee <derrickstolee@github.com>
Cc: git@vger.kernel.org
Subject: Re: t9210-scalar.sh fails with SANITIZE=undefined
Date: Thu, 22 Sep 2022 15:09:52 -0700	[thread overview]
Message-ID: <50c57a60-8346-6952-93d9-432a70ef74c5@github.com> (raw)
In-Reply-To: <Yyyzk3FVjmms7dkO@coredump.intra.peff.net>

Jeff King wrote:
> On Thu, Sep 22, 2022 at 08:35:22AM -0400, Derrick Stolee wrote:
> 
>>> I didn't dig further. It's obviously new in v2.38.0-rc1, but I'm not
>>> sure it's a show-stopper. It _might_ have been there all along, and is
>>> just now surfacing. Or it might be in an existing experimental feature
>>> that just wasn't exercised enough in the tests. Or if it really is new
>>> in scalar, then it will only hurt people using scalar, which didn't
>>> exist before. So I don't think it's a regression in the strictest sense,
>>> but it might be nice to get a more accurate understanding of the problem
>>> before the release.
>>
>> Interesting find!
>>
>> Here are the index-related settings that Scalar sets as of -rc1:
>>
>> * core.preloadIndex=true
>> * index.threads=true
>> * index.version=4
>>
>> My gut feeling is that index.version=4 might be the culprit. I thought
>> we tested GIT_TEST_INDEX_VERSION=4 in some CI modes, but apparently we
>> do not. Do you get the same error in other tests with that environment
>> variable?
> 
> Yeah, that seems by far the most likely of those three. And indeed,
> running with GIT_TEST_INDEX_VERSION=4 causes even t0000 to fail with the
> same problem. A minimal reproduction in git.git is just:
> 
>   make SANITIZE=undefined
>   git clone . tmp
>   cd tmp
>   rm .git/index
>   export GIT_INDEX_VERSION=4
>   ../git reset --hard ;# ok, writes v4 index
>   ../git reset --hard ;# fails reading unaligned v4 index
> 
> So it seems like a problem with the v4 format in general. Which...yikes.

Other than allowing us to use a (non-packed) 'struct ondisk_cache_entry' to
parse the index entries, is there any reason why the on-disk index entries
should (or need to be) 4-byte aligned? If not, we could update how we read
the 'ondisk' index entry in 'create_from_disk()' to avoid the misalignment.
E.g.:

------------------8<------------------8<------------------8<------------------
diff --git a/read-cache.c b/read-cache.c
index b09128b188..f132a3f256 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate,
 
 static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 					    unsigned int version,
-					    struct ondisk_cache_entry *ondisk,
+					    const char *ondisk,
 					    unsigned long *ent_size,
 					    const struct cache_entry *previous_ce)
 {
@@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	size_t len;
 	const char *name;
 	const unsigned hashsz = the_hash_algo->rawsz;
-	const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
+	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
 	unsigned int flags;
 	size_t copy_len = 0;
 	/*
------------------>8------------------>8------------------>8------------------

the do the same sort of conversion with the rest of the function. It's
certainly uglier than just using the 'struct ondisk_index_entry *' pointer,
but it should avoid the misaligned addressing error.

> 
> -Peff


  reply	other threads:[~2022-09-22 22:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 10:04 t9210-scalar.sh fails with SANITIZE=undefined Jeff King
2022-09-22 12:35 ` Derrick Stolee
2022-09-22 19:12   ` Jeff King
2022-09-22 22:09     ` Victoria Dye [this message]
2022-09-22 22:27       ` Jeff King
2022-09-22 22:56         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50c57a60-8346-6952-93d9-432a70ef74c5@github.com \
    --to=vdye@github.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).