git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Dragan Simic <dsimic@manjaro.org>,
	Kristoffer Haugsbakk <code@khaugsbakk.name>,
	Manlio Perillo <manlio.perillo@gmail.com>
Subject: Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
Date: Tue, 12 Mar 2024 04:05:25 -0400	[thread overview]
Message-ID: <20240312080525.GB47852@coredump.intra.peff.net> (raw)
In-Reply-To: <3f823e48-572c-4e19-ab76-e6d7cab9461f@web.de>

On Thu, Mar 07, 2024 at 08:42:22PM +0100, René Scharfe wrote:

> > Arguably starts_with() and this new function should both be inlined,
> > like we do for skip_prefix(), but I think that's out of scope for this
> > series.
> 
> Inlining would allow the compiler to unroll the loop for string
> constants.  I doubt it would do that for variables, as in the code
> below.
> 
> Inlining the strlen()+memcmp() version above might allow the compiler
> to push the strlen() call out of a loop.
> 
> Would any of that improve performance noticeably?  For the call sites
> below I doubt it.  But it would probably increase the object text size.

Good point. With non-constant prefixes in these cases, it probably
wouldn't buy much. There are a lot of other cases with actual string
constants. A compiler in theory could turn starts_with(str, "foo") into
a few instructions. But it's not even clear that it's in very many hot
paths. It would definitely be something we'd have to measure.

> > And it's possible I was simply too dumb to figure out xstrncmpz() here.
> > I'm waiting for René to show up and tell me how to do it. ;)
> 
> Nah, it's not a good fit, as it requires the two strings to have the
> same length.

Thanks for confirming I wasn't missing anything. :)

> > @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
> >  	/* left-trim */
> >  	bol += strspn(bol, " \t");
> >
> > -	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
> > +	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
> 
> If the strspn() call is safe (which it is, as the caller expects the
> string to be NUL-terminated) then you could use starts_with() here and
> avoid the length calculation.  But that would also match
> comment_line_str values that contain LF, which the _mem version does not
> and that's better.

I try not to read too much into the use of string functions on what
otherwise appears to be an unterminated buffer. While in Git it is quite
often terminated at allocation time (coming from a strbuf, etc) I feel
like I've fixed a number of out-of-bounds reads simply due to sloppy
practices. And even if something is correct today, it is easy for it to
change, since the assumption is made far away from allocation.

So I dunno. Like you said, fewer computations is fewer opportunity to
mess things up. I don't like the idea of introducing a new hand-grenade
that might blow up later, but maybe if it's right next to a strspn()
call that's already a problem, it's not materially making anything
worse.

> > +int starts_with_mem(const char *str, size_t len, const char *prefix)
> > +{
> > +	const char *end = str + len;
> > +	for (; ; str++, prefix++) {
> > +		if (!*prefix)
> > +			return 1;
> > +		else if (str == end || *str != *prefix)
> > +			return 0;
> > +	}
> > +}
> 
> So this checks whether a length-limited string has a prefix given as a
> NUL-terminated string.  I'd have called it mem_starts_with() and have
> expected starts_with_mem() to check a NUL-terminated string for a
> length-limited prefix (think !strncmp(str, prefix, prefixlen)).

I was going for consistency with skip_prefix_mem() and strip_suffix_mem().
To be fair, I probably also named those ones, but I think it's pretty
established. We've never needed the length-limited prefix variant yet,
so I don't know that we're squatting on anything too valuable.

> > @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
> >
> >  	/* The first paragraph is the title and cannot be trailers */
> >  	for (s = buf; s < buf + len; s = next_line(s)) {
> > -		if (s[0] == comment_line_char)
> > +		if (starts_with_mem(s, buf + len - s, comment_line_str))
> >  			continue;
> >  		if (is_blank_line(s))
> 
> Another case where starts_with() would be safe to use, as
> is_blank_line() expects (and gets) a NUL-terminated string, but it would
> allow matching comment_line_str values that contain LF.

Hmm. Yes, it is a NUL-terminated string always, but the caller has told
us not to look past end_of_log_message(). I suspect that if there is no
newline in comment_line_str() it's probably impossible to go past "len"
(just because the end of the log surely ends with either a NUL or a
newline). But it feels iffy to me. I dunno.

-Peff


  parent reply	other threads:[~2024-03-12  8:05 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  8:43 Clarify the meaning of "character" in the documentation Manlio Perillo
2024-03-05  9:00 ` Kristoffer Haugsbakk
2024-03-05 15:32   ` Junio C Hamano
2024-03-05 15:42     ` Dragan Simic
2024-03-05 16:38       ` Junio C Hamano
2024-03-05 17:28         ` Dragan Simic
2024-03-06  8:08         ` [messy PATCH] multi-byte core.commentChar Jeff King
2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
2024-03-07  9:15             ` [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper Jeff King
2024-03-07  9:16             ` [PATCH 02/15] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
2024-03-07  9:18             ` [PATCH 03/15] commit: refactor base-case of adjust_comment_line_char() Jeff King
2024-03-07  9:19             ` [PATCH 04/15] strbuf: avoid shadowing global comment_line_char name Jeff King
2024-03-07  9:20             ` [PATCH 05/15] environment: store comment_line_char as a string Jeff King
2024-03-07  9:21             ` [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace() Jeff King
2024-03-07  9:53               ` Jeff King
2024-03-07  9:22             ` [PATCH 07/15] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
2024-03-07  9:23             ` [PATCH 08/15] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
2024-03-07  9:23             ` [PATCH 09/15] prefer comment_line_str to comment_line_char for printing Jeff King
2024-03-07  9:24             ` [PATCH 10/15] find multi-byte comment chars in NUL-terminated strings Jeff King
2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
2024-03-07 11:08               ` Jeff King
2024-03-07 19:41                 ` René Scharfe
2024-03-07 19:47                   ` René Scharfe
2024-03-07 19:42               ` René Scharfe
2024-03-08 10:17                 ` Phillip Wood
2024-03-08 15:58                   ` Junio C Hamano
2024-03-08 16:20                     ` Phillip Wood
2024-03-12  8:19                       ` Jeff King
2024-03-12 14:36                         ` phillip.wood123
2024-03-13  6:23                           ` Jeff King
2024-03-12  8:05                 ` Jeff King [this message]
2024-03-14 19:37                   ` René Scharfe
2024-03-07  9:27             ` [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list Jeff King
2024-03-08 10:20               ` Phillip Wood
2024-03-12  8:21                 ` Jeff King
2024-03-07  9:28             ` [PATCH 13/15] wt-status: drop custom comment-char stringification Jeff King
2024-03-07  9:30             ` [PATCH 14/15] environment: drop comment_line_char compatibility macro Jeff King
2024-03-07  9:34             ` [PATCH 15/15] config: allow multi-byte core.commentChar Jeff King
2024-03-08 11:07             ` [PATCH 0/15] " Phillip Wood
2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
2024-03-12  9:17               ` [PATCH v2 01/16] config: forbid newline as core.commentChar Jeff King
2024-03-12  9:17               ` [PATCH v2 02/16] strbuf: simplify comment-handling in add_lines() helper Jeff King
2024-03-12  9:17               ` [PATCH v2 03/16] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
2024-03-12  9:17               ` [PATCH v2 04/16] commit: refactor base-case of adjust_comment_line_char() Jeff King
2024-03-12  9:17               ` [PATCH v2 05/16] strbuf: avoid shadowing global comment_line_char name Jeff King
2024-03-12  9:17               ` [PATCH v2 06/16] environment: store comment_line_char as a string Jeff King
2024-03-12  9:17               ` [PATCH v2 07/16] strbuf: accept a comment string for strbuf_stripspace() Jeff King
2024-03-12  9:17               ` [PATCH v2 08/16] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
2024-03-12  9:17               ` [PATCH v2 09/16] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
2024-03-12  9:17               ` [PATCH v2 10/16] prefer comment_line_str to comment_line_char for printing Jeff King
2024-03-12  9:17               ` [PATCH v2 11/16] find multi-byte comment chars in NUL-terminated strings Jeff King
2024-03-12  9:17               ` [PATCH v2 12/16] find multi-byte comment chars in unterminated buffers Jeff King
2024-03-12  9:17               ` [PATCH v2 13/16] sequencer: handle multi-byte comment characters when writing todo list Jeff King
2024-03-12  9:17               ` [PATCH v2 14/16] wt-status: drop custom comment-char stringification Jeff King
2024-03-12  9:17               ` [PATCH v2 15/16] environment: drop comment_line_char compatibility macro Jeff King
2024-03-12  9:17               ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Jeff King
2024-03-13 18:23                 ` Kristoffer Haugsbakk
2024-03-13 18:39                   ` Junio C Hamano
2024-03-15  5:59                   ` Jeff King
2024-03-15  7:16                     ` Kristoffer Haugsbakk
2024-03-15  8:10                       ` Jeff King
2024-03-15 13:30                         ` Kristoffer Haugsbakk
2024-03-15 15:40                         ` Junio C Hamano
2024-03-16  5:50                           ` Jeff King
2024-03-26 22:10                         ` Junio C Hamano
2024-03-26 22:12                           ` Kristoffer Haugsbakk
2024-03-27  7:46                           ` Jeff King
2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
2024-03-27 12:45                               ` Chris Torek
2024-03-27 16:13                               ` Junio C Hamano
2024-03-28  9:47                                 ` Jeff King
2024-03-27 14:53                             ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Junio C Hamano
2024-03-12 14:40               ` [PATCH v2 0/16] " phillip.wood123
2024-03-12 20:30                 ` Junio C Hamano
2024-03-05 16:58       ` Clarify the meaning of "character" in the documentation Kristoffer Haugsbakk
2024-03-05 17:20         ` Dragan Simic
2024-03-05 17:37           ` Kristoffer Haugsbakk
2024-03-05 21:19             ` Dragan Simic
2024-03-05 16:51     ` Kristoffer Haugsbakk
2024-03-05 17:37       ` Junio C Hamano
2024-03-05 17:49         ` Kristoffer Haugsbakk
2024-03-05 22:48   ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240312080525.GB47852@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=code@khaugsbakk.name \
    --cc=dsimic@manjaro.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=manlio.perillo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).