git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Michael J Gruber <git@grubix.eu>,
	Matthieu Moy <git@matthieu-moy.fr>,
	John Keeping <john@keeping.me.uk>,
	Karthik Nayak <karthik.188@gmail.com>, Jeff King <peff@peff.net>,
	Alex Henrie <alexhenrie24@gmail.com>,
	Philippe Blain <levraiphilippeblain@gmail.com>
Subject: Re: [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages
Date: Tue, 10 Mar 2020 10:24:34 -0700	[thread overview]
Message-ID: <xmqqo8t4jf71.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <pull.576.v2.git.1583807093.gitgitgadget@gmail.com> (Philippe Blain via GitGitGadget's message of "Tue, 10 Mar 2020 02:24:50 +0000")

"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The function find_subpos in ref-filter.c looks for two consecutive '\n' to
> find the end of the subject line, a sequence which is absent in messages
> using CRLF. This results in the whole message being parsed as the subject
> line (%(contents:subject)), and the body of the message (%(contents:body))
> being empty.

To be honest, I suspect that it is not a bug in the parser that
parsed out %(contents:subject), but a user error that left the log
message in CRLF endings ;-).

So "correctly handle CRLF" is probably a tad unfair to those who
wrote the current ref-filter code; a description that is more fair
to them is probably along the lines of "handle malformed log
messages more gracefully", I would think.

> Moreover, in copy_subject, '\n' is replaced by space, but '\r' is untouched,
> resulting in the escape sequence '^M' being output verbatim in most terminal
> emulators:
>
> $ git branch --verbose
> * crlf    2113b0e Subject first line^M ^M Body first line^M Body second line
>
> This bug is a regression for git branch --verbose, which bisects down to
> 949af06 (branch: use ref-filter printing APIs, 2017-01-10).

I am not sure where you want to go with this.  Whether it is shown
in the ^X notation (and some terminals even reverse color to
highlight them), or it is shown literally (i.e. causing the next
byte to overwrite the same line starting from the left-edge), you
would be annoyed either way, no?  I suspect that the latter would
annoy you even more.  Isn't what "most terminal emulators" do,
i.e. to show it in the ^X notation instead of emitting it literally,
a good thing?  IOW, "resulting in ..." is not correctly telling us
what you think is wrong---you don't have to blame terminals.

It is not limited to CR, and is not limited to control characters at
the end of the lines, no?  If you had "\a" (or "\r") in the middle
of the title, either the current or the old code would ring a bell
(or cause the next character to appear at the end of the same line)
or when piped to "less" you'd see "^G" (or "^M") in the liddle of
the line.

The old code used pretty.c::pretty_print_commit() mechanism;
pretty.c::format_subject() uses pretty.c::is_blank_line() to trim
whitespaces at the right end while trying to notice where the first
paragraph break is, so any whitespace at the end of first paragraph
break is removed, and each end of line got replaced by a SP, but it
did not do anything special to control characters in the middle of
the lines (and it didn't do anything to the control characters in
the middle of the line, either).  So while the old code happened to
cleanse CR at the end of the lines, it wasn't doing enough.

I think fixing _that_ is (and should be) outside the scope of this
series, of course.

>  2:  c68bc2b3788 ! 2:  aab1f45ba97 ref-filter: teach the API to correctly handle CRLF
>      @@ -1,26 +1,49 @@
>       Author: Philippe Blain <levraiphilippeblain@gmail.com>
>       
>      -    ref-filter: teach the API to correctly handle CRLF
>      +    ref-filter: fix the API to correctly handle CRLF

API is not changed (i.e. the callers do not have to do anything
special); only the implementation.

	ref-filter: handle CR at the end of the lines more gracefully

perhaps?

>           The ref-filter API does not correctly handle commit or tag messages that
>           use CRLF as the line terminator. Such messages can be created with the
>           `--verbatim` option of `git commit` and `git tag`, or by using `git
>           commit-tree` directly.
>       
>      +    This impacts the output `git branch`, `git tag` and `git for-each-ref`
>      +    when used with a `--format` argument containing the atoms
>      +    `%(contents:subject)` or `%(contents:body)`, as well as the output of
>      +    `git branch --verbose`, which uses `%(contents:subject)` internally.

In other words...

	When a commit or a tag object uses CRLF line endings, the
	ref-filter machinery does not identify the end of the first
	paragraph as intended by the writer, because it only looks
	for two consecutive LFs and CR-LF-CR-LF does not look like a
	blank line that separates paragraphs to it.  "git branch",
	"git tag" and "git for-each-ref" all rely on the messages
	split correctly into "%(contents:subject)" and
	"%(contents:body)" placeholders and ends up showing
	everything as the subject.

Now based on what I hinted in the far-above part, there can be two
valid solutions here.

 * recognize CRLF as a valid line ending, but still retain ^M in the
   message.  The replacement for "%(contents:subject)" would still
   end with "^M", and we add LF to it, which makes the resulting
   output end with CRLF and all is well.  This will keep "\a" and
   "\r" in the middle of the line in the output.

 * strip CR and any control character other than LF from everywhere.
   This will cleanse "\a" and "\r" in the middle of, or anywhere on,
   the line, so that "%(contents:subject)", "%(contents:body)" and
   "%(contents)" all are "clean".

I am not offhand sure which one is better (I haven't read the patch
to see which one you chose to implement).

>      +    The function find_subpos in ref-filter.c looks for two consecutive '\n'
>      +    to find the end of the subject line, a sequence which is absent in
>      +    messages using CRLF. This results in the whole message being parsed as
>      +    the subject line (`%(contents:subject)`), and the body of the message
>      +    (`%(contents:body)`)  being empty.



>      +    Moreover, in copy_subject, '\n' is replaced by space, but '\r' is
>      +    untouched, resulting in the escape sequence '^M' being output verbatim
>      +    in most terminal emulators:
>      ...
>      +    This bug is a regression for `git branch --verbose`, which
>      +    bisects down to 949af0684c (branch: use ref-filter printing APIs,
>      +    2017-01-10).
>      +
>      +    Fix this bug in ref-filter by hardening the logic in `copy_subject` and
>      +    `find_subpos` to correctly parse messages containing CRFL.

The above few lines may need revising (based on what I said to the
cover); --- even if they don't, CRFL here needs to become CRLF ;-)

Thanks for working on this.

  parent reply	other threads:[~2020-03-10 17:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 18:29 [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 2/3] ref-filter: teach the API to correctly handle CRLF Philippe Blain via GitGitGadget
2020-03-08 18:29 ` [PATCH 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
2020-03-09 15:14 ` [PATCH 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
2020-03-10  2:19   ` Philippe Blain
2020-03-10  2:24 ` [PATCH v2 " Philippe Blain via GitGitGadget
2020-03-10  2:24   ` [PATCH v2 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-03-10  2:24   ` [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF Philippe Blain via GitGitGadget
2020-03-10 17:50     ` Junio C Hamano
2020-03-10  2:24   ` [PATCH v2 3/3] log: add tests for messages containing CRLF to t4202 Philippe Blain via GitGitGadget
2020-03-10  3:31   ` [PATCH v2 0/3] Teach ref-filter API to correctly handle CRLF in messages Junio C Hamano
2020-03-10 17:24   ` Junio C Hamano [this message]
2020-10-12 18:09   ` [PATCH v3 0/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-12 18:09     ` [PATCH v3 1/3] t: add lib-crlf-messages.sh for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-12 22:22       ` Junio C Hamano
2020-10-14 13:22         ` Philippe Blain
2020-10-12 22:47       ` Eric Sunshine
2020-10-14 13:20         ` Philippe Blain
2020-10-14 13:45           ` Eric Sunshine
2020-10-14 13:52             ` Philippe Blain
2020-10-14 23:01               ` Eric Sunshine
2020-10-22  3:09         ` Philippe Blain
2020-10-12 18:09     ` [PATCH v3 2/3] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-12 22:24       ` Junio C Hamano
2020-10-14 13:09         ` Philippe Blain
2020-10-12 18:09     ` [PATCH v3 3/3] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-22  3:01     ` [PATCH v4 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-22  3:01       ` [PATCH v4 1/2] " Philippe Blain via GitGitGadget
2020-10-23  0:52         ` Junio C Hamano
2020-10-23  1:46           ` Philippe Blain
2020-10-28 20:24             ` Junio C Hamano
2020-10-29  1:29               ` Philippe Blain
2020-10-22  3:01       ` [PATCH v4 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget
2020-10-22 19:24         ` Philippe Blain
2020-10-29 12:48       ` [PATCH v5 0/2] ref-filter: handle CRLF at end-of-line more gracefully Philippe Blain via GitGitGadget
2020-10-29 12:48         ` [PATCH v5 1/2] " Philippe Blain via GitGitGadget
2020-10-29 12:48         ` [PATCH v5 2/2] log, show: add tests for messages containing CRLF Philippe Blain via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqo8t4jf71.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=alexhenrie24@gmail.com \
    --cc=git@grubix.eu \
    --cc=git@matthieu-moy.fr \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=john@keeping.me.uk \
    --cc=karthik.188@gmail.com \
    --cc=levraiphilippeblain@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).