From: Phillip Wood <phillip.wood123@gmail.com>
To: Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: peff@peff.net, derrickstolee@github.com, gitster@pobox.com,
Victoria Dye <vdye@github.com>
Subject: Re: [PATCH] read-cache: avoid misaligned reads in index v4
Date: Sun, 25 Sep 2022 09:25:08 +0100 [thread overview]
Message-ID: <bb3a2470-7ff5-e4a6-040a-96e0e3833978@gmail.com> (raw)
In-Reply-To: <pull.1366.git.1663962236069.gitgitgadget@gmail.com>
Hi Victoria
On 23/09/2022 20:43, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> The process for reading the index into memory from disk is to first read its
> contents into a single memory-mapped file buffer (type 'char *'), then
> sequentially convert each on-disk index entry into a corresponding incore
> 'cache_entry'. To access the contents of the on-disk entry for processing, a
> moving pointer within the memory-mapped file is cast to type 'struct
> ondisk_cache_entry *'.
>
> In index v4, the entries in the on-disk index file are written *without*
> aligning their first byte to a 4-byte boundary; entries are a variable
> length (depending on the entry name and whether or not extended flags are
> used). As a result, casting the 'char *' buffer pointer to 'struct
> ondisk_cache_entry *' then accessing its contents in a 'SANITIZE=undefined'
> build can trigger the following error:
>
> read-cache.c:1886:46: runtime error: member access within misaligned
> address <address> for type 'struct ondisk_cache_entry', which requires 4
> byte alignment
>
> Avoid this error by reading fields directly from the 'char *' buffer, using
> the 'offsetof' individual fields in 'struct ondisk_cache_entry'.
I was confused as to why this was safe as it means we're still reading
the individual fields from unaligned addresses. The reason it is safe is
that we use get_bexx() to read the fields and those functions handle
unaligned access. I wonder if it is worth adding a note to clarify that
if you re-roll.
Best Wishes
Phillip
>
> Reported-by: Jeff King <peff@peff.net>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> read-cache: avoid misaligned reads in index v4
>
> This fixes the bug reported in [1], where unaligned index entries in the
> memory-mapped index file triggered a 'SANITIZE=undefined' error due to
> casting to & accessing unaligned data a 4 byte-aligned 'struct
> ondisk_cache_entry *' type.
>
> In addition to the originally-reported 't9210-scalar.sh' now passing in
> a 'SANITIZE=undefined' build, I did some light testing by first writing
> a v4 index with a released version of Git (v2.37), then running some
> index-modifying operations ('git status', 'git add') with this patch's
> changes, then again running 'git status' with the stable version. I
> didn't see anything out of the ordinary but, considering how critical
> "reading the index" is, I'd very much appreciate some extra-thorough
> reviews on this patch. :)
>
> Thanks!
>
> * Victoria
>
> [1]
> https://lore.kernel.org/git/YywzNTzd72tox8Z+@coredump.intra.peff.net/
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1366%2Fvdye%2Fbugfix%2Findex-v4-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1366/vdye/bugfix/index-v4-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1366
>
> read-cache.c | 42 +++++++++++++++++++++++-------------------
> 1 file changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index b09128b1884..d16eb979060 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate,
>
> static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
> unsigned int version,
> - struct ondisk_cache_entry *ondisk,
> + const char *ondisk,
> unsigned long *ent_size,
> const struct cache_entry *previous_ce)
> {
> @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
> size_t len;
> const char *name;
> const unsigned hashsz = the_hash_algo->rawsz;
> - const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
> + const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
> unsigned int flags;
> size_t copy_len = 0;
> /*
> @@ -1901,15 +1901,15 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>
> if (flags & CE_EXTENDED) {
> int extended_flags;
> - extended_flags = get_be16(flagsp + 1) << 16;
> + extended_flags = get_be16(flagsp + sizeof(uint16_t)) << 16;
> /* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
> if (extended_flags & ~CE_EXTENDED_FLAGS)
> die(_("unknown index entry format 0x%08x"), extended_flags);
> flags |= extended_flags;
> - name = (const char *)(flagsp + 2);
> + name = (const char *)(flagsp + 2 * sizeof(uint16_t));
> }
> else
> - name = (const char *)(flagsp + 1);
> + name = (const char *)(flagsp + sizeof(uint16_t));
>
> if (expand_name_field) {
> const unsigned char *cp = (const unsigned char *)name;
> @@ -1935,20 +1935,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>
> ce = mem_pool__ce_alloc(ce_mem_pool, len);
>
> - ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
> - ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
> - ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec);
> - ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec);
> - ce->ce_stat_data.sd_dev = get_be32(&ondisk->dev);
> - ce->ce_stat_data.sd_ino = get_be32(&ondisk->ino);
> - ce->ce_mode = get_be32(&ondisk->mode);
> - ce->ce_stat_data.sd_uid = get_be32(&ondisk->uid);
> - ce->ce_stat_data.sd_gid = get_be32(&ondisk->gid);
> - ce->ce_stat_data.sd_size = get_be32(&ondisk->size);
> + ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
> + + offsetof(struct cache_time, sec));
> + ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
> + + offsetof(struct cache_time, sec));
> + ce->ce_stat_data.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
> + + offsetof(struct cache_time, nsec));
> + ce->ce_stat_data.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
> + + offsetof(struct cache_time, nsec));
> + ce->ce_stat_data.sd_dev = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
> + ce->ce_stat_data.sd_ino = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
> + ce->ce_mode = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
> + ce->ce_stat_data.sd_uid = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
> + ce->ce_stat_data.sd_gid = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
> + ce->ce_stat_data.sd_size = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
> ce->ce_flags = flags & ~CE_NAMEMASK;
> ce->ce_namelen = len;
> ce->index = 0;
> - oidread(&ce->oid, ondisk->data);
> + oidread(&ce->oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
>
> if (expand_name_field) {
> if (copy_len)
> @@ -2117,12 +2121,12 @@ static unsigned long load_cache_entry_block(struct index_state *istate,
> unsigned long src_offset = start_offset;
>
> for (i = offset; i < offset + nr; i++) {
> - struct ondisk_cache_entry *disk_ce;
> struct cache_entry *ce;
> unsigned long consumed;
>
> - disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
> - ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, &consumed, previous_ce);
> + ce = create_from_disk(ce_mem_pool, istate->version,
> + mmap + src_offset,
> + &consumed, previous_ce);
> set_index_entry(istate, i, ce);
>
> src_offset += consumed;
>
> base-commit: 1b3d6e17fe83eb6f79ffbac2f2c61bbf1eaef5f8
next prev parent reply other threads:[~2022-09-25 8:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-23 19:43 [PATCH] read-cache: avoid misaligned reads in index v4 Victoria Dye via GitGitGadget
2022-09-23 21:39 ` Jeff King
2022-09-23 22:04 ` Junio C Hamano
2022-09-26 15:39 ` Victoria Dye
2022-09-26 17:35 ` Jeff King
2022-09-26 19:08 ` Jeff King
2022-09-26 19:31 ` Jeff King
2022-09-26 23:02 ` Junio C Hamano
2022-09-25 8:25 ` Phillip Wood [this message]
2022-09-26 17:54 ` Junio C Hamano
2022-09-28 17:19 ` [PATCH v2] " Victoria Dye via GitGitGadget
2022-09-28 17:34 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bb3a2470-7ff5-e4a6-040a-96e0e3833978@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=phillip.wood@dunelm.org.uk \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).