git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, peff@peff.net
Subject: Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
Date: Fri, 16 Sep 2016 13:17:11 -0700	[thread overview]
Message-ID: <xmqq8turk3aw.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <0152df30db0972d61ff45b2b099ad1242aacd431.1474047135.git.jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 16 Sep 2016 10:37:24 -0700")

Jonathan Tan <jonathantanmy@google.com> writes:

> Instead of repeatedly calling "check_header" (as in this patch), one
> alternate method to accomplish this would be to have a buffer of
> potential header text in struct mailinfo to be flushed whenever a header
> is known to end (for example, if we detect the start of a new header),
> but this makes the logic more complicated - for example, the flushing
> would not only invoke check_header but would also need to reconstruct
> the original lines, possibly decode them into UTF-8, and store them in
> log_message, and any failures would be noticed a few "lines" away from
> the original failure point. Also, care would need to be taken to flush
> the buffer at all appropriate places.

I am not sure how much the UTF-8 decoding argument above matters.

The current way handle_commit_msg() is structured (before any of
your patches) is for it to take one raw line at a time and:

    - If we haven't seen a non-header line (i.e. at the beginning,
      or we were reading in-body headers), return without doing
      anything.

    - If we are told to honor in-body headers and if we haven't seen
      a non-header line, see if the line itself looks like a header
      and if so, handle it as an in-body header and return.  If that
      line is not an in-body header, continue processing.

    - If the processing reaches at this point, we are done with the
      headers (i.e. mi->header_stage is set to 0).

    - Make sure the line is in utf8.

    - If it is a scissors line and we are told to honor scissors
      lines, ignore what we have read so far and go back to "we
      haven't seen a non-header line" state and return.

    - If it is a patch break, return and signal the caller we are
      done with the log message.

    - Otherwise accumulate the line as part of the log message.

The bug we want to address is in the second step.  We only look at
the first line of folded in-body header line, because we are fed one
line at a time.

If we keep the location of UTF8 conversion, and buffered the in-body
header in "struct mailinfo *mi" (like you seem to do in this patch),
what we will queue there will be _before_ conversion.  We'd call
check_header() on it once we know one logical line of a header is
accumulated, and check_header() would do the right conversion via
decode_header() etc., so I do not see why you need to worry about
the encoding issues at all.

I wonder if the simplest would be to introduce another state in the
state machine that is "we know we are processing in-body header, and
we have read early part of an in-body header line that may not be
complete".

In other words, wouldn't something like the illustration at the end
of this message sufficient?  If the body consists solely of in-body
header without any patch or patchbreak, we may reach EOF with
something in mi->in_line_header buffer and nothing in
mi->log_message and without this function getting any chance to
return 1, so a careful caller may want to flush in_line_header, but
the overall result of the mailinfo subsystem in such a case would be
an error ("you didn't have any patch or a message?"), so it may not
matter too much.

What am I missing?

handle_commit_msg(...)
{
	if (mi->in_line_header->len) {
		/* we have read the beginning of one in-line header */
		if (line->len && isspace(*line->buf))
			append to mi->in_line_header strbuf;
                        return 0;
		/* otherwise we know mi->in_line_header is now complete */
		check_header(mi, mi->in_line_header, ...);
		strbuf_reset(&mi->in_line_header);
	}

	if (mi->header_stage && (it is a blank line))
		return 0;

	if (mi->use_inbody_headers && mi->header_stage &&
	    (the line looks like beginning of 2822 header)) {
		strbuf_addbuf(&mi->in_line_header, line);
		return 0;
	}
        /* otherwise we are no longer looking at headers */
        mi->header_stage = 0;

	/* normalize the log message to UTF-8. */
	if (convert_to_utf8(mi, line, mi->charset.buf))
		return 0; /* mi->input_error already set */

	if (mi->use_scissors && is_scissors_line(line)) {
		int i;

		strbuf_setlen(&mi->log_message, 0);
		mi->header_stage = 1;

		/*
		 * We may have already read "secondary headers"; purge
		 * them to give ourselves a clean restart.
		 */
		for (i = 0; header[i]; i++) {
			if (mi->s_hdr_data[i])
				strbuf_release(mi->s_hdr_data[i]);
			mi->s_hdr_data[i] = NULL;
		}
		return 0;
	}

	if (patchbreak(line)) {
		if (mi->message_id)
			strbuf_addf(&mi->log_message,
				    "Message-Id: %s\n", mi->message_id);
		return 1;
	}

	strbuf_addbuf(&mi->log_message, line);
	return 0;
}
 

  reply	other threads:[~2016-09-16 20:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-02 19:58 [PATCH] sequencer: support folding in rfc2822 footer Jonathan Tan
2016-09-03  2:23 ` Junio C Hamano
2016-09-06 22:08   ` Jonathan Tan
2016-09-06 23:30     ` Jonathan Tan
2016-09-07  6:38       ` Jeff King
2016-09-16 17:37         ` [RFC/PATCH 0/3] handle multiline in-body headers Jonathan Tan
2016-09-16 18:29           ` Junio C Hamano
2016-09-16 17:37         ` [RFC/PATCH 1/3] mailinfo: refactor commit message processing Jonathan Tan
2016-09-16 19:12           ` Junio C Hamano
2016-09-16 21:46             ` Jeff King
2016-09-16 17:37         ` [RFC/PATCH 2/3] mailinfo: correct malformed test example Jonathan Tan
2016-09-16 19:19           ` Junio C Hamano
2016-09-16 22:42             ` Jonathan Tan
2016-09-16 22:55               ` Junio C Hamano
2016-09-17  0:31                 ` Jonathan Tan
2016-09-17  3:48                   ` Junio C Hamano
2016-09-16 17:37         ` [RFC/PATCH 3/3] mailinfo: handle in-body header continuations Jonathan Tan
2016-09-16 20:17           ` Junio C Hamano [this message]
2016-09-16 20:49             ` Jonathan Tan
2016-09-16 20:59               ` Junio C Hamano
2016-09-16 22:36                 ` Jonathan Tan
2016-09-16 23:04                   ` Junio C Hamano
2016-09-17  0:22                     ` Jonathan Tan
2016-09-16 21:51           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8turk3aw.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).