git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thalia Archibald <thalia@archibald.dev>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 1/6] fast-import: tighten parsing of paths
Date: Mon, 01 Apr 2024 09:05:48 +0000	[thread overview]
Message-ID: <4523FBB5-68AE-4D98-BB5F-6B498ADF1C25@archibald.dev> (raw)
In-Reply-To: <E01C617F-3720-42C0-83EE-04BB01643C86@archibald.dev>

(Sorry for first sending as HTML)

On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@pks.im> wrote:
> 
> So this is part of the "filemodify" section with the following syntax:
> 
>    'M' SP <mode> SP <dataref> SP <path> LF
> 
> The way I interpret this change is that <path> could previously be empty
> (`SP LF`), but now it needs to be quoted (`SP '"' '"' LF). This seems to
> be related to cases (1) and (3) of your commit messages, where
> "filemodify" could contain an unquoted empty string whereas "filecopy"
> and "filerename" would complain about such an unquoted string.
> 
> In any case I don't see a strong argument why exactly it should be
> forbidden to have an unquoted empty path here, and I do wonder whether
> it would break existing writers of the format when we retroactively
> tighten this case. Isn't it possible to instead loosen it such that all
> three of the above actions know to handle unquoted empty paths?

At first, I strongly thought that we should forbid this case of unquoted empty
paths. It's a somewhat peculiar case in that it refers to the root of the repo
and few front-ends use it. I surveyed git fast-export, git-filter-repo,
Reposurgeon, hg-fast-export, cvs-fast-export (by Eric S. Raymond),
cvs-fast-export (by Roger Vaughn), svn2git, bzr-fastexport, and bk fast-export,
but none of them ever target the root of the repo. I assumed that if an unquoted
empty path was ever emitted, it was likely an bug that should not be accepted
(e.g., a null byte array somehow).

However, most occurrences of <path> in the grammar have allowed unquoted empty
strings to mean the root for 14 years and documentation has implied that it's
allowed for 13 years. It's just the two cases of the destination paths of
filecopy and filerename that don't allow it, and those are less-used operations,
so front-ends may never encounter that error.

Some assumed errors in emitting empty paths are caught by validation with file
modes, so even if it's loosened it's still fairly safe. filemodify must be
040000 when it targets the root, and filecopy and filerename to the root must
have a source path that's a directory. The worst case is filedelete
unintentionally targeting the root, but that's been allowed to be an unquoted
empty path for almost its entire lifetime, so I don't think should be changed.

I've changed it to allow unquoted empty paths for all operations in patch v2
3/8 (fast-import: allow unquoted empty path for root).

> This is loosening the condition so that we also accept unquoted paths
> now. Okay.

On the contrary, v1 tightens all paths to forbid unquoted empty strings. v2 now
allows it and should make that change more clear.

> On Fri, Mar 22, 2024 at 12:03:18AM +0000, Thalia Archibald wrote:
>> +/*
>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>> + * space if `allow_spaces` is nonzero.
>> + */
>> +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field)
>> +{
>> + strbuf_reset(sb);
>> + if (*p == '"') {
>> + if (unquote_c_style(sb, p, endp))
>> + die("Invalid %s: %s", field, command_buf.buf);
>> + } else {
>> + if (allow_spaces)
>> + *endp = p + strlen(p);
> 
> I wonder whether `stop_at_unquoted_space` might be more telling. It's
> not like we disallow spaces here, it's that we treat them as the
> separator to the next field.

I agree, but I’d rather something shorter, so I’ve changed it to `include_spaces`.

>> + else
>> + *endp = strchr(p, ' ');
>> + if (*endp == p)
>> + die("Missing %s: %s", field, command_buf.buf);
> 
> Error messages should start with a lower-case letter and be
> translateable. But these are simply moved over from the previous code,
> so I don't mind much if you want to keep them as-is.
> 
>> + strbuf_add(sb, p, *endp - p);
>> + }
>> +}


fast-import isn’t a porcelain command, AFAIK, so I thought the convention is
that its output isn't translated?

From po/README.md:
> 
> The output from Git's plumbing utilities will primarily be read by
> programs and would break scripts under non-C locales if it was
> translated. Plumbing strings should not be translated, since
> they're part of Git's API.


I would, however, like to improve its error messages. For example, diagnosing
errors more precisely or changing the outdated “GIT” to “Git”.

To what extent should the exact error messages be considered part of Git's API?

Thalia


  parent reply	other threads:[~2024-04-01  9:06 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22  0:03 [PATCH 0/6] fast-import: tighten parsing of paths Thalia Archibald
2024-03-22  0:03 ` [PATCH 1/6] " Thalia Archibald
2024-03-22  0:11   ` Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
     [not found]     ` <E01C617F-3720-42C0-83EE-04BB01643C86@archibald.dev>
2024-04-01  9:05       ` Thalia Archibald [this message]
2024-03-22  0:03 ` [PATCH 2/6] fast-import: directly use strbufs for paths Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-03-22  0:03 ` [PATCH 3/6] fast-import: release unfreed strbufs Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-04-01  9:06     ` Thalia Archibald
2024-03-22  0:03 ` [PATCH 4/6] fast-import: remove dead strbuf Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-03-22  0:03 ` [PATCH 5/6] fast-import: document C-style escapes for paths Thalia Archibald
2024-03-28  8:21   ` Patrick Steinhardt
2024-04-01  9:06     ` Thalia Archibald
2024-03-22  0:03 ` [PATCH 6/6] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-01  9:02 ` [PATCH v2 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-01  9:02   ` [PATCH v2 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-10  6:27     ` Patrick Steinhardt
2024-04-10  8:18       ` Chris Torek
2024-04-10  8:44         ` Thalia Archibald
2024-04-10  8:51           ` Chris Torek
2024-04-10  9:14             ` Thalia Archibald
2024-04-10  9:42               ` Patrick Steinhardt
2024-04-10  9:16             ` Thalia Archibald
2024-04-10  9:12       ` Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-10  6:27     ` Patrick Steinhardt
2024-04-10 10:07       ` Thalia Archibald
2024-04-10 10:18         ` Patrick Steinhardt
2024-04-01  9:03   ` [PATCH v2 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-10  6:27     ` Patrick Steinhardt
2024-04-01  9:03   ` [PATCH v2 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 5/8] fast-import: improve documentation for unquoted paths Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-01  9:03   ` [PATCH v2 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-07 21:19   ` [PATCH v2 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-07 23:46     ` Eric Sunshine
2024-04-08  6:25       ` Patrick Steinhardt
2024-04-08  7:15         ` Thalia Archibald
2024-04-08  9:07           ` Patrick Steinhardt
2024-04-08 14:52         ` Junio C Hamano
2024-04-10  9:54   ` [PATCH v3 " Thalia Archibald
2024-04-10  9:55     ` [PATCH v3 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-10  9:55     ` [PATCH v3 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-10  9:55     ` [PATCH v3 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-11 19:59       ` Junio C Hamano
2024-04-10  9:55     ` [PATCH v3 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-11 19:53       ` Junio C Hamano
2024-04-10  9:55     ` [PATCH v3 5/8] fast-import: improve documentation for unquoted paths Thalia Archibald
2024-04-11 19:51       ` Junio C Hamano
2024-04-10  9:56     ` [PATCH v3 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-10 18:28       ` Junio C Hamano
2024-04-10 22:50         ` Thalia Archibald
2024-04-11  5:32           ` Junio C Hamano
2024-04-11  9:14             ` Patrick Steinhardt
2024-04-10  9:56     ` [PATCH v3 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-10 18:51       ` Junio C Hamano
2024-04-10  9:56     ` [PATCH v3 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-10 19:21       ` Junio C Hamano
2024-04-12  8:01     ` [PATCH v4 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-12  8:02       ` [PATCH v4 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-12 16:34         ` Junio C Hamano
2024-04-13  0:07           ` Thalia Archibald
2024-04-13 18:33             ` Junio C Hamano
2024-04-12  8:03       ` [PATCH v4 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 5/8] fast-import: improve documentation for path quoting Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-12  8:03       ` [PATCH v4 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-14  1:11       ` [PATCH v5 0/8] fast-import: tighten parsing of paths Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 1/8] fast-import: tighten path unquoting Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 2/8] fast-import: directly use strbufs for paths Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 3/8] fast-import: allow unquoted empty path for root Thalia Archibald
2024-04-14  1:11         ` [PATCH v5 4/8] fast-import: remove dead strbuf Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 5/8] fast-import: improve documentation for path quoting Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 6/8] fast-import: document C-style escapes for paths Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 7/8] fast-import: forbid escaped NUL in paths Thalia Archibald
2024-04-14  1:12         ` [PATCH v5 8/8] fast-import: make comments more precise Thalia Archibald
2024-04-15  7:06         ` [PATCH v5 0/8] fast-import: tighten parsing of paths Patrick Steinhardt
2024-04-15 17:07           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4523FBB5-68AE-4D98-BB5F-6B498ADF1C25@archibald.dev \
    --to=thalia@archibald.dev \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).