git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Miklos Vajna <vmiklos@frugalware.org>
Cc: Jakub Narebski <jnareb@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re* [GIT PULL] sh updates for 2.6.25
Date: Wed, 16 Apr 2008 12:58:54 -0700	[thread overview]
Message-ID: <7v3aplr2pt.fsf_-_@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <20080416084435.GJ8387@genesis.frugalware.org> (Miklos Vajna's message of "Wed, 16 Apr 2008 10:44:35 +0200")

Miklos Vajna <vmiklos@frugalware.org> writes:

> On Tue, Apr 15, 2008 at 08:25:28PM -0700, Junio C Hamano <gitster@pobox.com> wrote:
>> > If we are at it, I had a similar bugreport: If one doesn't use an empty
>> > line after the first line in the commit message, a git-format-patch +
>> > git-am combo will strip newlines from the commit message:
>> 
>> That's not a bug but an intended behaviour.  You are triggering "RFC 2822
>> line folding" of Subject: header.
>
> Hm, then is it git-am that would have to be fixed up to properly unfold
> such a subject? (I mean not stripping newlines.)

"Properly unfolding" by definition means remove LF before WSP at the
beginning of the continued lines, so it does not solve it.  "git am" is
primarily an email acceptance tool, and the mail pathway between the
sender and the recipient can splice the "Subject: " at any point, so the
receiver cannot assume that the lines were folded at the place the
originator intended them to be folded (often, the sender wanted a single
line).

But let's step back a bit.

First of all, if you are using git as "fast filesystem that lets you build
your own SCM on top of", aka "plumbing", you can have your commit log
message in any way you want.  "git-commit-tree" allows you to record
arbitrary commit log message without whitespace cleansing (it might eat
NUL, but that would be a bug if it did so), and "git-cat-file commit"
would give you what you recorded literally.

However, if you are using git as a full featured SCM, aka "Porcelain", you
have to work within the rules of how the world works, which were not set
arbitrarily but came from real world constraints.  "git-format-patch" and
"git-am" are two examples of such Porcelain programs.

Unlike olden days when people used CVS and recorded a two-week's worth of
work as a single huge commit, we encourage working with many small
commits, and we have tools to give you overview of the history without
drowning you in the sea of information.  "git reflog", "git branch -v",
"git-log --pretty=oneline", "git show-branch", "git fmt-merge-msg",
"gitk", and "git shortlog" all are built around the notion that the first
line of the commit is _special_ in that by reading only that line, there
should be enough information for the reader to tell what the commit is
about.  Also, emailed patch has the "Subject: " line to serve the same
purpose and by definition that is a single liner.

When one adopts the notion of "a single line at the top summarizes what
the commit is about", it is very natural to call that a "title", and
having a blank line between the title and the body to separate them also
becomes natural, and it matches how a patch is presented in email, as a
bonus, so it matches people's expectation.

So this format is merely a convention when viewed at the "plumbing" level,
but it is more important than just a convention if you are living at the
"Porcelain" level; if you deviate from that, "Porcelain" would not work
very well for you.

We can do two things about "would not work very well" part above when you
do have a commit whose first paragraph has more then one lines, and
dealing with such a commit gracefully is important.

People who are used to other systems without a good history summarization
tools can and do write such log messages.  People who make commits on such
systems whose commits are imported to git (perhaps even without them
knowing about it) do not have an incentive to use a short-and-clear single
line summary in each of their commits, as their system may not give a good
way to make use of the result of such a practice.

Very old git literally took "the first line is the summary" approach,
which meant that the first line of such a multi-line paragraph at the
beginning became "Subject: ", and the message body started in the mid
sentence.  "git log --pretty=oneline" appeared to chomp a sentence in the
middle (while in fact the guilty party who chomped the sentence in the
middle was the committer). People who migrated from CVS hated the loss of
information. Worse yet, because "rebase" is implemented in terms of
"format-patch piped to am" (which we can eventually change to use
git-sequencer), if you rebase such a commit, you will get an extra empty
line between the first line and the subsequent lines.

These days, format-patch was taught to use "the first paragraph" as the
summarizing first line to avoid chomping a sentence in the middle.  This
change did not hurt people who use git "Porcelain", as the commit log
message for them is always "a single line summary, a blank line, and the
body".  The first paragraph is the same as the first line for them.  But
for commits that have a multi-line paragraph at the beginning, information
lossage is avoided this way.  Now the first chunk of the message, even if
it is splattered over two physical lines, is used as the summary.

The tools that allocate only one display line for each commit (again,
format-patch is one of them because there is only one "Subject: " line)
still need to cope with this, as they have only one line to work with.
The way format-patch and friends do so is to take it as a logically single
line folded into multiple lines.  format-patch (and --pretty=email)
happens to know that the output medium (i.e. rfc 2822 messages) allows to
express the logically single line as multiple physical lines, so its
output "preserves" the original line breaks, but "git am" is in no
position to honor it, at least without an extra option to tell it that it
is safe and meaningful to do so.

So in short, when you use "am", it by design unfolds the "Subject: " line
and there is no bug there.  "rebase" being implemented in terms of
"format-patch piped to am" does mangle the message because of this, but
if anything that is a bug in rebase, and not "am".

And this is a potential fix to the issue, which was made possible only
because recently "rebase" started passing an extra option to "am".

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Wed, 16 Apr 2008 12:50:48 -0700
Subject: [PATCH] rebase: do not munge commit log message

Traditionally git-rebase was implemented in terms of "format-patch" piped
to "am -3", to strike balance between speed (because it avoids a rather
expensive read-tree/merge-recursive machinery most of the time) and
flexibility (the magic "-3" allows it to fall back to 3-way merge as
necessary).  However, this combination has one flaw when dealing with a
nonstandard commit log message format that has more than one lines in the
first paragraph, because such a "first line" is formatted as logically a
single line, and unfolded at the applying end.

This teaches "git am --rebasing" to take advantage of the fact that the
mbox message "git rebase" prepares for it records the original commit
object name, and that such a commit _is_ available locally.  It reads the
log message from the original commit object instead.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-am.sh                    |   19 ++++++++++++++-----
 t/t3408-rebase-multi-line.sh |   41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 5 deletions(-)
 create mode 100755 t/t3408-rebase-multi-line.sh

diff --git a/git-am.sh b/git-am.sh
index 245e1db..5a7695e 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -327,11 +327,20 @@ do
 			echo "Patch is empty.  Was it split wrong?"
 			stop_here $this
 		}
-		SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
-		case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
-
-		(echo "$SUBJECT" ; echo ; cat "$dotest/msg") |
-			git stripspace > "$dotest/msg-clean"
+		if test -f "$dotest/rebasing" &&
+			commit=$(sed -e 's/^From \([0-9a-f]*\) .*/\1/' \
+				-e q "$dotest/$msgnum") &&
+			test "$(git cat-file -t "$commit")" = commit
+		then
+			git cat-file commit "$commit" |
+			sed -e '1,/^$/d' >"$dotest/msg-clean"
+		else
+			SUBJECT="$(sed -n '/^Subject/ s/Subject: //p' "$dotest/info")"
+			case "$keep_subject" in -k)  SUBJECT="[PATCH] $SUBJECT" ;; esac
+
+			(echo "$SUBJECT" ; echo ; cat "$dotest/msg") |
+				git stripspace > "$dotest/msg-clean"
+		fi
 		;;
 	esac
 
diff --git a/t/t3408-rebase-multi-line.sh b/t/t3408-rebase-multi-line.sh
new file mode 100755
index 0000000..e12cd57
--- /dev/null
+++ b/t/t3408-rebase-multi-line.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+test_description='rebasing a commit with multi-line first paragraph.'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+
+	>file &&
+	git add file &&
+	test_tick &&
+	git commit -m initial &&
+
+	echo hello >file &&
+	test_tick &&
+	git commit -a -m "A sample commit log message that has a long
+summary that spills over multiple lines.
+
+But otherwise with a sane description."
+
+	git branch side &&
+
+	git reset --hard HEAD^ &&
+	>elif &&
+	git add elif &&
+	test_tick &&
+	git commit -m second
+
+'
+
+test_expect_success rebase '
+
+	git checkout side &&
+	git rebase master &&
+	git cat-file commit HEAD | sed -e "1,/^$/d" >actual &&
+	git cat-file commit side@{1} | sed -e "1,/^$/d" >expect &&
+	test_cmp expect actual
+
+'
+
+test_done
-- 
1.5.5.120.gea9a0

  reply	other threads:[~2008-04-16 19:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20080415172333.GA29489@linux-sh.org>
2008-04-15 18:01 ` [GIT PULL] sh updates for 2.6.25 Linus Torvalds
2008-04-15 18:18   ` Linus Torvalds
2008-04-15 18:30   ` Paul Mundt
2008-04-15 19:56     ` Linus Torvalds
2008-04-16 18:54       ` Alex Riesen
2008-04-16 20:02       ` Junio C Hamano
2008-04-16 20:17         ` Linus Torvalds
2008-04-15 18:41   ` Junio C Hamano
2008-04-15 18:43   ` Jakub Narebski
2008-04-16  0:37     ` Miklos Vajna
2008-04-16  1:06       ` [PATCH] format-patch: Make sure the subject is always a one-liner Miklos Vajna
2008-04-16  3:25       ` [GIT PULL] sh updates for 2.6.25 Junio C Hamano
2008-04-16  8:44         ` Miklos Vajna
2008-04-16 19:58           ` Junio C Hamano [this message]
2008-04-16 20:22             ` Re* " Jakub Narebski
2008-04-17 21:38             ` Miklos Vajna
2008-04-27 19:04   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v3aplr2pt.fsf_-_@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vmiklos@frugalware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).