From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <derrickstolee@github.com>,
Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation
Date: Fri, 24 Mar 2023 19:08:56 -0400 [thread overview]
Message-ID: <ZB4tiLfNsD3LW/JS@nand.local> (raw)
In-Reply-To: <20230321175612.GF3119834@coredump.intra.peff.net>
On Tue, Mar 21, 2023 at 01:56:12PM -0400, Jeff King wrote:
> I like the idea of centralizing the bounds checks here.
>
> I do think copying lseek() is a little awkward, though. As you note, we
> only care about SEEK_SET and SEEK_CUR, and we have to BUG() on anything
> else. Which means we have a run-time check, rather than a compile time
> check. If we had two functions like:
>
> void bitmap_index_seek_to(struct bitmap_index *bitmap_git, size_t pos)
> {
> bitmap_git->map_pos = pos;
> if (bitmap_git->map_pos >= bitmap_git->map_size)
> ...etc...
> }
>
> void bitmap_index_incr(struct bitmap_index *bitmap_git, size_t offset)
> {
> bitmap_index_seek_to(bitmap_git, st_add(bitmap_git->map_pos, offset));
> }
>
> then the compiler can see all cases directly. I dunno how much it
> matters.
Yeah, I think that trying to copy the interface of lseek() was a
mistake, since it ended up creating more awkwardness than anything else.
I like the combination of bitmap_index_seek_to() and _incr(), though
though I called the latter bitmap_index_seek_set() instead.
We've got to be a little reminiscent of lseek() after all ;-).
> The other thing that's interesting from a bounds-checking perspective is
> that you're checking the seek _after_ you've read an item. Which has two
> implications:
>
> - I don't think we could ever overflow size_t here.
Yep, agreed. Let's drop the st_add() call there entirely, since it's not
doing anything useful and just serving to confuse the reader.
> - The more interesting case is that we are not overflowing size_t, but
> simply walking past bitmap_git->map_size. But again, we are reading
> first and then seeking. So if our seek does go past, then by
> definition we just read garbage bytes, which is undefined behavior.
>
> For a bounds-check, wouldn't we want it the other way around? To
> make sure we have 4 bytes available, and then if so read them and
> increment the offset?
I thought on first blush that this would be a much larger change, but
actually it's quite small. The EWAH code already checks its reads ahead
of time, so we don't have to worry about those. It's really that
read_be32() and read_u8() do the read-then-seek.
So I think that the change here is limited to making sure that we can
read enough bytes in those functions before actually executing the read.
> > + if (bitmap_git->map_pos >= bitmap_git->map_size)
> > + BUG("bitmap position exceeds size (%"PRIuMAX" >= %"PRIuMAX")",
> > + (uintmax_t)bitmap_git->map_pos,
> > + (uintmax_t)bitmap_git->map_size);
>
> This uses ">=", which is good, because it is valid to walk the pointer
> to one past the end of the map, which is effectively EOF. But as above,
> in that condition the callers should be checking for this EOF state
> before reading the bytes.
I think that having both is useful. Checking before you read avoids
undefined behavior. Checking after you seek ensures that we never have a
bitmap_index struct with a bogus map_pos value.
Thanks,
Taylor
next prev parent reply other threads:[~2023-03-24 23:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-20 20:02 [PATCH 0/6] pack-bitmap: miscellaneous mmap read hardening Taylor Blau
2023-03-20 20:02 ` [PATCH 1/6] pack-bitmap.c: hide bitmap internals in `read_u8()` Taylor Blau
2023-03-21 17:35 ` Jeff King
2023-03-24 17:52 ` Derrick Stolee
2023-03-20 20:02 ` [PATCH 2/6] pack-bitmap.c: hide bitmap internals in `read_be32()` Taylor Blau
2023-03-20 20:02 ` [PATCH 3/6] pack-bitmap.c: drop unnecessary 'inline's Taylor Blau
2023-03-21 17:40 ` Jeff King
2023-03-20 20:02 ` [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation Taylor Blau
2023-03-21 17:56 ` Jeff King
2023-03-24 18:04 ` Derrick Stolee
2023-03-24 18:29 ` Jeff King
2023-03-24 23:23 ` Taylor Blau
2023-03-25 4:57 ` Jeff King
2023-03-24 23:13 ` Taylor Blau
2023-03-24 23:24 ` Taylor Blau
2023-03-24 23:08 ` Taylor Blau [this message]
2023-03-20 20:02 ` [PATCH 5/6] pack-bitmap.c: use `bitmap_index_seek()` where possible Taylor Blau
2023-03-21 18:05 ` Jeff King
2023-03-24 18:06 ` Derrick Stolee
2023-03-24 18:35 ` Jeff King
2023-03-24 19:43 ` Junio C Hamano
2023-03-24 20:37 ` Jeff King
2023-03-24 21:38 ` Junio C Hamano
2023-03-24 22:57 ` Taylor Blau
2023-03-20 20:02 ` [PATCH 6/6] pack-bitmap.c: factor out `bitmap_index_seek_commit()` Taylor Blau
2023-03-21 18:13 ` Jeff King
2023-03-21 18:16 ` Taylor Blau
2023-03-21 18:27 ` Jeff King
2023-03-24 18:09 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZB4tiLfNsD3LW/JS@nand.local \
--to=me@ttaylorr.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).