git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thalia Archibald <thalia@archibald.dev>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>,
	Chris Torek <chris.torek@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v4 1/8] fast-import: tighten path unquoting
Date: Sat, 13 Apr 2024 00:07:38 +0000	[thread overview]
Message-ID: <41A4023D-E468-4D63-9881-57392D06D852@archibald.dev> (raw)
In-Reply-To: <xmqqle5io7zf.fsf@gitster.g>

On Apr 12, 2024, at 09:34, Junio C Hamano <gitster@pobox.com> wrote:
> Thalia Archibald <thalia@archibald.dev> writes:
>> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
>> index 782bda007c..ce9231afe6 100644
>> --- a/builtin/fast-import.c
>> +++ b/builtin/fast-import.c
>> @@ -2258,10 +2258,56 @@ static uintmax_t parse_mark_ref_space(const char **p)
>> return mark;
>> }
>> 
>> +/*
>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>> + * space if `include_spaces` is nonzero.
>> + */
> 
> It took me three reads to understand the last sentence.  It would
> have been easier if it were written as "it may contain a space only
> if ...".  I'd also named it "allow_unquoted_spaces"---it is not like
> this function includes extra spaces on top of whatever was given.

Patrick commented on this earlier too:

> On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@pks.im> wrote:
>> 
>> On Fri, Mar 22, 2024 at 12:03:18AM +0000, Thalia Archibald wrote:
>>> +/*
>>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>>> + * space if `allow_spaces` is nonzero.
>>> + */
>>> +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field)
>>> +{
>>> + strbuf_reset(sb);
>>> + if (*p == '"') {
>>> + if (unquote_c_style(sb, p, endp))
>>> + die("Invalid %s: %s", field, command_buf.buf);
>>> + } else {
>>> + if (allow_spaces)
>>> + *endp = p + strlen(p);
>> 
>> I wonder whether `stop_at_unquoted_space` might be more telling. It's
>> not like we disallow spaces here, it's that we treat them as the
>> separator to the next field.
> 
> I agree, but I’d rather something shorter, so I’ve changed it to `include_spaces`.

With all that in mind, I think Patrick is right that the best way to
think of this is that space functions as a field separator, conditional
on this flag. In practice, that leads to restrictions on whether you
can write paths that contain spaces without quotes.

As to naming, `allow_spaces` and `include_spaces` are problematic for
the reasons you both have pointed out. I think `stop_at_unquoted_space`
is problematic, because that’s not where it stops when quoted, but
rather at the close quote. I think that `include_unquoted_spaces` is
good, because it describes that spaces are included in this field when
it is an unquoted string. `allow_unquoted_spaces` implies that its an
error to have a space, but no such error is raised here.

How’s this change? I’ve reworded the relevant sentence and specified any
“it”s and replaced the “when unquoted, …” qualifier with “unquoted
strings may …” to reduce ambiguity.

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index fd23a00150..2070c78c56 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2259,12 +2259,13 @@ static uintmax_t parse_mark_ref_space(const char **p)
}

/*
- * Parse the path string into the strbuf. It may be quoted with escape sequences
- * or unquoted without escape sequences. When unquoted, it may only contain a
- * space if `include_spaces` is nonzero.
+ * Parse the path string into the strbuf. The path can either be quoted with
+ * escape sequences or unquoted without escape sequences. Unquoted strings may
+ * contain spaces only if `include_unquoted_spaces` is nonzero; otherwise, it
+ * stops parsing at the first space.
 */
static void parse_path(struct strbuf *sb, const char *p, const char **endp,
-		int include_spaces, const char *field)
+		int include_unquoted_spaces, const char *field)
{
	if (*p == '"') {
		if (unquote_c_style(sb, p, endp))
@@ -2272,7 +2273,7 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp,
		if (strlen(sb->buf) != sb->len)
			die("NUL in %s: %s", field, command_buf.buf);
	} else {
-		if (include_spaces)
+		if (include_unquoted_spaces)
			*endp = p + strlen(p);
		else
			*endp = strchrnul(p, ' ');
@@ -2282,7 +2283,7 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp,

/*
 * Parse the path string into the strbuf, and complain if this is not the end of
- * the string. It may contain spaces even when unquoted.
+ * the string. Unquoted strings may contain spaces.
 */
static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)
{
@@ -2295,7 +2296,7 @@ static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)

/*
 * Parse the path string into the strbuf, and ensure it is followed by a space.
- * It may not contain spaces when unquoted. Update *endp to point to the first
+ * Unquoted strings may not contain spaces. Update *endp to point to the first
 * character after the space.
 */
static void parse_path_space(struct strbuf *sb, const char *p,


Thalia


  reply	other threads:[~2024-04-13  0:08 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22  0:03 [PATCH 0/6] fast-import: tighten parsing of paths Thalia Archibald
2024-03-22  0:03 ` [PATCH 1/6] " Thalia Archibald
2024-03-22  0:11   ` Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
     [not found]     ` <E01C617F-3720-42C0-83EE-04BB01643C86@archibald.dev>
2024-04-01  9:05       ` Thalia Archibald
2024-03-22  0:03 ` [PATCH 2/6] fast-import: directly use strbufs for paths Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-03-22  0:03 ` [PATCH 3/6] fast-import: release unfreed strbufs Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-04-01  9:06     ` Thalia Archibald
2024-03-22  0:03 ` [PATCH 4/6] fast-import: remove dead strbuf Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-03-22  0:03 ` [PATCH 5/6] fast-import: document C-style escapes for paths Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-04-01  9:06     ` Thalia Archibald
2024-03-22  0:03 ` [PATCH 6/6] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-01  9:02 ` [PATCH v2 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-01  9:02   ` [PATCH v2 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-10  6:27     ` Patrick Steinhardt
2024-04-10  8:18       ` Chris Torek
2024-04-10  8:44         ` Thalia Archibald
2024-04-10  8:51           ` Chris Torek
2024-04-10  9:14             ` Thalia Archibald
2024-04-10  9:42               ` Patrick Steinhardt
2024-04-10  9:16             ` Thalia Archibald
2024-04-10  9:12       ` Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-10  6:27     ` Patrick Steinhardt
2024-04-10 10:07       ` Thalia Archibald
2024-04-10 10:18         ` Patrick Steinhardt
2024-04-01  9:03   ` [PATCH v2 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-10  6:27     ` Patrick Steinhardt
2024-04-01  9:03   ` [PATCH v2 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 5/8] fast-import: improve documentation for unquoted paths Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-07 21:19   ` [PATCH v2 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-07 23:46     ` Eric Sunshine
2024-04-08  6:25       ` Patrick Steinhardt
2024-04-08  7:15         ` Thalia Archibald
2024-04-08  9:07           ` Patrick Steinhardt
2024-04-08 14:52         ` Junio C Hamano
2024-04-10  9:54   ` [PATCH v3 " Thalia Archibald
2024-04-10  9:55     ` [PATCH v3 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-10  9:55     ` [PATCH v3 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-10  9:55     ` [PATCH v3 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-11 19:59       ` Junio C Hamano
2024-04-10  9:55     ` [PATCH v3 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-11 19:53       ` Junio C Hamano
2024-04-10  9:55     ` [PATCH v3 5/8] fast-import: improve documentation for unquoted paths Thalia Archibald
2024-04-11 19:51       ` Junio C Hamano
2024-04-10  9:56     ` [PATCH v3 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-10 18:28       ` Junio C Hamano
2024-04-10 22:50         ` Thalia Archibald
2024-04-11  5:32           ` Junio C Hamano
2024-04-11  9:14             ` Patrick Steinhardt
2024-04-10  9:56     ` [PATCH v3 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-10 18:51       ` Junio C Hamano
2024-04-10  9:56     ` [PATCH v3 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-10 19:21       ` Junio C Hamano
2024-04-12  8:01     ` [PATCH v4 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-12  8:02       ` [PATCH v4 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-12 16:34         ` Junio C Hamano
2024-04-13  0:07           ` Thalia Archibald [this message]
2024-04-13 18:33             ` Junio C Hamano
2024-04-12  8:03       ` [PATCH v4 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 5/8] fast-import: improve documentation for path quoting Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-14  1:11       ` [PATCH v5 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 5/8] fast-import: improve documentation for path quoting Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-15  7:06         ` [PATCH v5 0/8] fast-import: tighten parsing of paths Patrick Steinhardt
2024-04-15 17:07           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41A4023D-E468-4D63-9881-57392D06D852@archibald.dev \
    --to=thalia@archibald.dev \
    --cc=chris.torek@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).