git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Michael J Gruber <git@grubix.eu>,
	Matthieu Moy <git@matthieu-moy.fr>,
	John Keeping <john@keeping.me.uk>,
	Karthik Nayak <karthik.188@gmail.com>, Jeff King <peff@peff.net>,
	Alex Henrie <alexhenrie24@gmail.com>,
	Philippe Blain <levraiphilippeblain@gmail.com>
Subject: Re: [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF
Date: Tue, 10 Mar 2020 10:50:51 -0700	[thread overview]
Message-ID: <xmqqk13sjdz8.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <aab1f45ba976d088a8c68573a21ed2458915d6a6.1583807093.git.gitgitgadget@gmail.com> (Philippe Blain via GitGitGadget's message of "Tue, 10 Mar 2020 02:24:52 +0000")

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Philippe Blain <levraiphilippeblain@gmail.com>
>
> The ref-filter API does not correctly handle commit or tag messages that
> ...

(I won't repeat myself here; see 0/3)

> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
> ---
>  ref-filter.c             | 19 +++++++++++++++++--
>  t/t3203-branch-output.sh | 26 +++++++++++++++++++++-----
>  t/t6300-for-each-ref.sh  |  5 +++++
>  t/t7004-tag.sh           |  7 +++++++
>  4 files changed, 50 insertions(+), 7 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 79bb5206783..537cc4de42c 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1050,10 +1050,18 @@ static char *copy_subject(const char *buf, unsigned long len)
>  {
>  	char *r = xmemdupz(buf, len);
>  	int i;
> +	struct strbuf sb = STRBUF_INIT;
>  
> -	for (i = 0; i < len; i++)
> +	strbuf_attach(&sb, r, len, len + 1);
> +	for (i = 0; i < sb.len; i++) {
>  		if (r[i] == '\n')
>  			r[i] = ' ';
> +		if (r[i] == '\r') {
> +			strbuf_remove(&sb, i, 1);
> +			i -= 1;
> +		}
> +	}
> +	strbuf_detach(&sb, NULL);
>  	return r;
>  }

So, the chosen solution is to only remove CR and do so only in the
first paragraph.  Even if the second and subsequent paragraphs use
CRLF line endings, those CRs are retained.  Also, a lone CR in the
first paragraph that is not part of CRLF end-of-line marker is lost,
but other control characters like "\a" are retained.  That sounds
like an almost "minimum" change but not quite.

The way strbuf is used in the implementation is a bit curious and
risky.  Currently we do not realloc to shrink a strbuf, but when we
start doing so, this code would break because you are relying on the
fact that just before calling strbuf_detach(), sb.buf happens to be
still the same as r.

As the point of the function is "we want to return a copy of what is
in buf[0..len] but the input is a (possibly multi-line) paragraph,
and we want a single line 'title', so replace end-of-line with a
SP", a minimal translation that is more faithful to the intended
meaning of the function would be:

	static char *copy_subject(const char *buf, unsigned long len)
	{
		struct strbuf sb = STRBUF_INIT;

		for (i = 0; i < len; i++) {
                	if (buf[i] == '\r' &&
			    i + 1 < len && buf[i + 1] == '\n')
                            continue; /* ignore CR in CRLF */

			if (buf[i] == '\n')
				strbuf_addch(&sb, ' ');
			else
				strbuf_addch(&sb, buf[i]);
		}
                return strbuf_detach(&sb, NULL);
	}

perhaps?  This retains CR in the middle if exists just like BEL in
the middle of the line, and uses strbuf in a safe way, I think.

> @@ -1184,15 +1192,22 @@ static void find_subpos(const char *buf,
>  		eol = strchrnul(buf, '\n');
>  		if (*eol)
>  			eol++;
> +		/*  protect against messages that might contain \r\n */
> +		if (*eol == '\r')
> +			eol++;

This is quite convoluted.  You found a LF and then are hoping to see
if the byte after LF is CR (i.e. you are looking for LFCR, not
CRLF).

>  		buf = eol;
>  	}
>  	*sublen = buf - *sub;
>  	/* drop trailing newline, if present */
>  	if (*sublen && (*sub)[*sublen - 1] == '\n')
>  		*sublen -= 1;
> +	/*  protect against commit messages that might contain \r\n */
> +	else if (*sublen && (*sub)[*sublen - 1] == '\r')
> +		*sublen -= 3; /* drop '\r\n\r' */

Yeek.  To find CR-LF-CR-LF, you look for CR-LF-CR?  You only checked
that the previous byte is NOT LF (because you are in else-if, so the
previous if must have failed) and you have at least one previous byte
that is CR.  What gives us OK that *sublen is sufficiently long that
we can safely subtract 3 from it (we only checked that it is not 0;
who says it is 3 or more in this code???) and the two bytes before
the CR we are looking at here are CRLF???

  reply	other threads:[~2020-03-10 17:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 2/3] ref-filter: teach the API to correctly handle CRLF Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
2020-03-09 15:14 ` [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
2020-03-10  2:19   ` Philippe Blain
2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
2020-03-10  2:24   ` [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-03-10  2:24   ` [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF Philippe Blain via GitGitGadget
2020-03-10 17:50     ` Junio C Hamano [this message]
2020-03-10  2:24   ` [PATCH v2 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
2020-03-10  3:31   ` [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
2020-03-10 17:24   ` Junio C Hamano
2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-12 22:22       ` Junio C Hamano
2020-10-14 13:22         ` Philippe Blain
2020-10-12 22:47       ` Eric Sunshine
2020-10-14 13:20         ` Philippe Blain
2020-10-14 13:45           ` Eric Sunshine
2020-10-14 13:52             ` Philippe Blain
2020-10-14 23:01               ` Eric Sunshine
2020-10-22  3:09         ` Philippe Blain
2020-10-12 18:09     ` [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-12 22:24       ` Junio C Hamano
2020-10-14 13:09         ` Philippe Blain
2020-10-12 18:09     ` [PATCH v3 3/3] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
2020-10-23  0:52         ` Junio C Hamano
2020-10-23  1:46           ` Philippe Blain
2020-10-28 20:24             ` Junio C Hamano
2020-10-29  1:29               ` Philippe Blain
2020-10-22  3:01       ` [PATCH v4 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-22 19:24         ` Philippe Blain
2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-29 12:48         ` [PATCH v5 1/2] " Philippe Blain via GitGitGadget
2020-10-29 12:48         ` [PATCH v5 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk13sjdz8.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=alexhenrie24@gmail.com \
    --cc=git@grubix.eu \
    --cc=git@matthieu-moy.fr \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=john@keeping.me.uk \
    --cc=karthik.188@gmail.com \
    --cc=levraiphilippeblain@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).