git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation
Date: Fri, 24 Mar 2023 19:08:56 -0400	[thread overview]
Message-ID: <ZB4tiLfNsD3LW/JS@nand.local> (raw)
In-Reply-To: <20230321175612.GF3119834@coredump.intra.peff.net>

On Tue, Mar 21, 2023 at 01:56:12PM -0400, Jeff King wrote:
> I like the idea of centralizing the bounds checks here.
>
> I do think copying lseek() is a little awkward, though. As you note, we
> only care about SEEK_SET and SEEK_CUR, and we have to BUG() on anything
> else. Which means we have a run-time check, rather than a compile time
> check. If we had two functions like:
>
>   void bitmap_index_seek_to(struct bitmap_index *bitmap_git, size_t pos)
>   {
> 	bitmap_git->map_pos = pos;
> 	if (bitmap_git->map_pos >= bitmap_git->map_size)
> 	   ...etc...
>   }
>
>   void bitmap_index_incr(struct bitmap_index *bitmap_git, size_t offset)
>   {
> 	bitmap_index_seek_to(bitmap_git, st_add(bitmap_git->map_pos, offset));
>   }
>
> then the compiler can see all cases directly. I dunno how much it
> matters.

Yeah, I think that trying to copy the interface of lseek() was a
mistake, since it ended up creating more awkwardness than anything else.
I like the combination of bitmap_index_seek_to() and _incr(), though
though I called the latter bitmap_index_seek_set() instead.

We've got to be a little reminiscent of lseek() after all ;-).

> The other thing that's interesting from a bounds-checking perspective is
> that you're checking the seek _after_ you've read an item. Which has two
> implications:
>
>   - I don't think we could ever overflow size_t here.

Yep, agreed. Let's drop the st_add() call there entirely, since it's not
doing anything useful and just serving to confuse the reader.

>   - The more interesting case is that we are not overflowing size_t, but
>     simply walking past bitmap_git->map_size. But again, we are reading
>     first and then seeking. So if our seek does go past, then by
>     definition we just read garbage bytes, which is undefined behavior.
>
>     For a bounds-check, wouldn't we want it the other way around? To
>     make sure we have 4 bytes available, and then if so read them and
>     increment the offset?

I thought on first blush that this would be a much larger change, but
actually it's quite small. The EWAH code already checks its reads ahead
of time, so we don't have to worry about those. It's really that
read_be32() and read_u8() do the read-then-seek.

So I think that the change here is limited to making sure that we can
read enough bytes in those functions before actually executing the read.

> > +	if (bitmap_git->map_pos >= bitmap_git->map_size)
> > +		BUG("bitmap position exceeds size (%"PRIuMAX" >= %"PRIuMAX")",
> > +		    (uintmax_t)bitmap_git->map_pos,
> > +		    (uintmax_t)bitmap_git->map_size);
>
> This uses ">=", which is good, because it is valid to walk the pointer
> to one past the end of the map, which is effectively EOF. But as above,
> in that condition the callers should be checking for this EOF state
> before reading the bytes.

I think that having both is useful. Checking before you read avoids
undefined behavior. Checking after you seek ensures that we never have a
bitmap_index struct with a bogus map_pos value.

Thanks,
Taylor

  parent reply	other threads:[~2023-03-24 23:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 20:02 [PATCH 0/6] pack-bitmap: miscellaneous mmap read hardening Taylor Blau
2023-03-20 20:02 ` [PATCH 1/6] pack-bitmap.c: hide bitmap internals in `read_u8()` Taylor Blau
2023-03-21 17:35   ` Jeff King
2023-03-24 17:52     ` Derrick Stolee
2023-03-20 20:02 ` [PATCH 2/6] pack-bitmap.c: hide bitmap internals in `read_be32()` Taylor Blau
2023-03-20 20:02 ` [PATCH 3/6] pack-bitmap.c: drop unnecessary 'inline's Taylor Blau
2023-03-21 17:40   ` Jeff King
2023-03-20 20:02 ` [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation Taylor Blau
2023-03-21 17:56   ` Jeff King
2023-03-24 18:04     ` Derrick Stolee
2023-03-24 18:29       ` Jeff King
2023-03-24 23:23         ` Taylor Blau
2023-03-25  4:57           ` Jeff King
2023-03-24 23:13       ` Taylor Blau
2023-03-24 23:24         ` Taylor Blau
2023-03-24 23:08     ` Taylor Blau [this message]
2023-03-20 20:02 ` [PATCH 5/6] pack-bitmap.c: use `bitmap_index_seek()` where possible Taylor Blau
2023-03-21 18:05   ` Jeff King
2023-03-24 18:06     ` Derrick Stolee
2023-03-24 18:35       ` Jeff King
2023-03-24 19:43         ` Junio C Hamano
2023-03-24 20:37           ` Jeff King
2023-03-24 21:38             ` Junio C Hamano
2023-03-24 22:57               ` Taylor Blau
2023-03-20 20:02 ` [PATCH 6/6] pack-bitmap.c: factor out `bitmap_index_seek_commit()` Taylor Blau
2023-03-21 18:13   ` Jeff King
2023-03-21 18:16     ` Taylor Blau
2023-03-21 18:27       ` Jeff King
2023-03-24 18:09         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZB4tiLfNsD3LW/JS@nand.local \
    --to=me@ttaylorr.com \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).