git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH/RFC 0/3] post-receive-email: explicitly set Content-Type header
@ 2013-08-02 23:21 Jonathan Nieder
  2013-08-02 23:22 ` [PATCH 1/3] hooks/post-receive-email: use plumbing instead of git log/show Jonathan Nieder
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jonathan Nieder @ 2013-08-02 23:21 UTC (permalink / raw)
  To: git; +Cc: Alexey Shumkin, Jakub Narebski, Alexander Gerasiov

Hi all,

This is a revival of [1], which declares encoding in emails to make it
more likely that they can be read.  I like to think it avoids the
mistakes of previous attempts, but I'll let you judge. :)

Sorry for the long delay.  Thoughts of all kinds welcome, as always.

Thanks,
Jonathan

Gerrit Pape (1):
  hooks/post-receive-email: set declared encoding to utf-8

Jonathan Nieder (2):
  hooks/post-receive-email: use plumbing instead of 'git log' and 'git show'
  hooks/post-receive-email: force log messages in UTF-8

 contrib/hooks/post-receive-email | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

[1] http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=183070

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] hooks/post-receive-email: use plumbing instead of git log/show
  2013-08-02 23:21 [PATCH/RFC 0/3] post-receive-email: explicitly set Content-Type header Jonathan Nieder
@ 2013-08-02 23:22 ` Jonathan Nieder
  2013-08-02 23:23 ` [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8 Jonathan Nieder
  2013-08-02 23:24 ` [PATCH 3/3] hooks/post-receive-email: set declared encoding to utf-8 Jonathan Nieder
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2013-08-02 23:22 UTC (permalink / raw)
  To: git; +Cc: Alexey Shumkin, Jakub Narebski, Alexander Gerasiov

This way the hook doesn't have to keep being tweaked as porcelain
learns new features like color and pagination.

While at it, replace the "git rev-list | git shortlog" idiom with
plain "git shortlog" for simplicity.

Except for depending less on the value of settings like '[log]
abbrevCommit', no change in output intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/hooks/post-receive-email | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index 15311502..72084511 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -471,7 +471,7 @@ generate_delete_branch_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git show -s --pretty=oneline $oldrev
+	git diff-tree -s --always --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
@@ -547,11 +547,11 @@ generate_atag_email()
 		# performed on them
 		if [ -n "$prevtag" ]; then
 			# Show changes since the previous release
-			git rev-list --pretty=short "$prevtag..$newrev" | git shortlog
+			git shortlog "$prevtag..$newrev"
 		else
 			# No previous tag, show all the changes since time
 			# began
-			git rev-list --pretty=short $newrev | git shortlog
+			git shortlog $newrev
 		fi
 		;;
 	*)
@@ -571,7 +571,7 @@ generate_delete_atag_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git show -s --pretty=oneline $oldrev
+	git diff-tree -s --always --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
@@ -617,7 +617,7 @@ generate_general_email()
 	echo ""
 	if [ "$newrev_type" = "commit" ]; then
 		echo $LOGBEGIN
-		git show --no-color --root -s --pretty=medium $newrev
+		git diff-tree -s --always --pretty=medium $newrev
 		echo $LOGEND
 	else
 		# What can we do here?  The tag marks an object that is not
@@ -636,7 +636,7 @@ generate_delete_general_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git show -s --pretty=oneline $oldrev
+	git diff-tree -s --always --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
  2013-08-02 23:21 [PATCH/RFC 0/3] post-receive-email: explicitly set Content-Type header Jonathan Nieder
  2013-08-02 23:22 ` [PATCH 1/3] hooks/post-receive-email: use plumbing instead of git log/show Jonathan Nieder
@ 2013-08-02 23:23 ` Jonathan Nieder
  2013-08-04 14:54   ` Alexey Shumkin
  2013-08-02 23:24 ` [PATCH 3/3] hooks/post-receive-email: set declared encoding to utf-8 Jonathan Nieder
  2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2013-08-02 23:23 UTC (permalink / raw)
  To: git; +Cc: Alexey Shumkin, Jakub Narebski, Alexander Gerasiov

Git commands write commit messages in UTF-8 by default, but that
default can be overridden by the [i18n] commitEncoding and
logOutputEncoding settings.  With such a setting, the emails written
by the post-receive-email hook use a mixture of encodings:

 1. Log messages use the configured log output encoding, which is
    meant to be whatever encoding works best with local terminals
    (and does not have much to do with what encoding should be used
    for email)

 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
    port (which uses Unicode filesystem APIs), always UTF-8

 3. The "This is an automated email" preface uses a project description
    from .git/description, which is typically in UTF-8 to support
    gitweb.

So (1) is configurable, and (2) and (3) are unconfigurable and
typically UTF-8.  Override the log output encoding to always use UTF-8
when writing the email to get the best chance of a comprehensible
single-encoding email.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/hooks/post-receive-email | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index 72084511..ba93a0d8 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -471,7 +471,7 @@ generate_delete_branch_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git diff-tree -s --always --pretty=oneline $oldrev
+	git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
@@ -571,7 +571,7 @@ generate_delete_atag_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git diff-tree -s --always --pretty=oneline $oldrev
+	git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
@@ -617,7 +617,7 @@ generate_general_email()
 	echo ""
 	if [ "$newrev_type" = "commit" ]; then
 		echo $LOGBEGIN
-		git diff-tree -s --always --pretty=medium $newrev
+		git diff-tree -s --always --encoding=UTF-8 --pretty=medium $newrev
 		echo $LOGEND
 	else
 		# What can we do here?  The tag marks an object that is not
@@ -636,7 +636,7 @@ generate_delete_general_email()
 	echo "       was  $oldrev"
 	echo ""
 	echo $LOGBEGIN
-	git diff-tree -s --always --pretty=oneline $oldrev
+	git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
 	echo $LOGEND
 }
 
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] hooks/post-receive-email: set declared encoding to utf-8
  2013-08-02 23:21 [PATCH/RFC 0/3] post-receive-email: explicitly set Content-Type header Jonathan Nieder
  2013-08-02 23:22 ` [PATCH 1/3] hooks/post-receive-email: use plumbing instead of git log/show Jonathan Nieder
  2013-08-02 23:23 ` [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8 Jonathan Nieder
@ 2013-08-02 23:24 ` Jonathan Nieder
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2013-08-02 23:24 UTC (permalink / raw)
  To: git; +Cc: Alexey Shumkin, Jakub Narebski, Alexander Gerasiov

From: Gerrit Pape <pape@smarden.org>
Date: Thu, 11 Dec 2008 20:27:21 +0000

Some email clients (e.g., claws-mail) display the message body
incorrectly when the charset is not defined explicitly in a
Content-Type header.  "git log" generates logs in UTF-8 encoding by
default, so add a Content-Type header declaring that encoding to
the emails the post-receive-email example hook sends.

[jn: also setting the Content-Transfer-Encoding so MTAs know what
 kind of mangling might be needed when sending to a non 8-bit clean
 SMTP host]

Requested-by: Alexander Gerasiov <gq@debian.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.  Thanks for reading.

 contrib/hooks/post-receive-email | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index ba93a0d8..8ee410f8 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -242,6 +242,9 @@ generate_email_header()
 	cat <<-EOF
 	To: $recipients
 	Subject: ${emailprefix}$projectdesc $refname_type $short_refname ${change_type}d. $describe
+	MIME-Version: 1.0
+	Content-Type: text/plain; charset=utf-8
+	Content-Transfer-Encoding: 8bit
 	X-Git-Refname: $refname
 	X-Git-Reftype: $refname_type
 	X-Git-Oldrev: $oldrev
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
  2013-08-02 23:23 ` [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8 Jonathan Nieder
@ 2013-08-04 14:54   ` Alexey Shumkin
  2013-08-04 18:14     ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Shumkin @ 2013-08-04 14:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jakub Narebski, Alexander Gerasiov

On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:
> Git commands write commit messages in UTF-8 by default, but that
> default can be overridden by the [i18n] commitEncoding and
> logOutputEncoding settings.  With such a setting, the emails written
> by the post-receive-email hook use a mixture of encodings:
> 
>  1. Log messages use the configured log output encoding, which is
>     meant to be whatever encoding works best with local terminals
>     (and does not have much to do with what encoding should be used
>     for email)
> 
>  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
>     port (which uses Unicode filesystem APIs), always UTF-8
I cannot say exactly if it makes sense for THIS patch, but I'd like to
remind about Cygwin port, which definitely does not use UTF-8 encoding
(in my case it is Windows-1251) for filenames.
> 
>  3. The "This is an automated email" preface uses a project description
>     from .git/description, which is typically in UTF-8 to support
>     gitweb.
> 
> So (1) is configurable, and (2) and (3) are unconfigurable and
> typically UTF-8.  Override the log output encoding to always use UTF-8
> when writing the email to get the best chance of a comprehensible
> single-encoding email.
I cannot agree to receive e-mails in UTF-8 only for Windows projects
which have non-UTF-8 encoding. I want to see and read correctly formed
e-mail without any corrupted symbols instead of filenames (that is the
main problem here as far as filenames are not converted unlike log
messages)
> 
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
>  contrib/hooks/post-receive-email | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
> index 72084511..ba93a0d8 100755
> --- a/contrib/hooks/post-receive-email
> +++ b/contrib/hooks/post-receive-email
> @@ -471,7 +471,7 @@ generate_delete_branch_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git diff-tree -s --always --pretty=oneline $oldrev
> +	git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
>  	echo $LOGEND
>  }
>  
> @@ -571,7 +571,7 @@ generate_delete_atag_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git diff-tree -s --always --pretty=oneline $oldrev
> +	git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
>  	echo $LOGEND
>  }
>  
> @@ -617,7 +617,7 @@ generate_general_email()
>  	echo ""
>  	if [ "$newrev_type" = "commit" ]; then
>  		echo $LOGBEGIN
> -		git diff-tree -s --always --pretty=medium $newrev
> +		git diff-tree -s --always --encoding=UTF-8 --pretty=medium $newrev
>  		echo $LOGEND
>  	else
>  		# What can we do here?  The tag marks an object that is not
> @@ -636,7 +636,7 @@ generate_delete_general_email()
>  	echo "       was  $oldrev"
>  	echo ""
>  	echo $LOGBEGIN
> -	git diff-tree -s --always --pretty=oneline $oldrev
> +	git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
>  	echo $LOGEND
>  }
>  
> -- 
> 1.8.4.rc1
> 

-- 
Alexey Shumkin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
  2013-08-04 14:54   ` Alexey Shumkin
@ 2013-08-04 18:14     ` Jonathan Nieder
  2013-08-05  8:45       ` Alexey Shumkin
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2013-08-04 18:14 UTC (permalink / raw)
  To: Alexey Shumkin; +Cc: git, Jakub Narebski, Alexander Gerasiov

Alexey Shumkin wrote:
> On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:

>>  1. Log messages use the configured log output encoding, which is
>>     meant to be whatever encoding works best with local terminals
>>     (and does not have much to do with what encoding should be used
>>     for email)
>>
>>  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
>>     port (which uses Unicode filesystem APIs), always UTF-8
>
> I cannot say exactly if it makes sense for THIS patch, but I'd like to
> remind about Cygwin port, which definitely does not use UTF-8 encoding
> (in my case it is Windows-1251) for filenames.
>
>> 
>>  3. The "This is an automated email" preface uses a project description
>>     from .git/description, which is typically in UTF-8 to support
>>     gitweb.

Thanks for clarifying.  So in the context you describe, (1) is
configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and
there is no way with current git facilities to force the email to use
a single encoding unless (3) happens to contain no special characters.

What is the value of the "[i18n] commitEncoding" setting in your
project?  What encoding do the raw commit messages (shown with
"git log --format=raw") use for their text, and what do they declare
with an in-commit 'encoding' header, if any?

Does everyone on this project use Cygwin?  That should be fine, but
I'd expect there to be problems as soon as someone wants to try the
Mingw port ("Git for Windows").

I wonder if there should be an "[i18n] repositoryPathEncoding"
configuration item to support this kind of repository.  Then git could
be aware of the intended encoding of paths, could recode them for
display to a terminal, and at least on Linux and Mingw could recode
them for use in filenames on disk.  "repositoryPathEncoding = none"
would mean the current behavior of treating paths as raw sequences of
bytes.

What do you think?
Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
  2013-08-04 18:14     ` Jonathan Nieder
@ 2013-08-05  8:45       ` Alexey Shumkin
  0 siblings, 0 replies; 7+ messages in thread
From: Alexey Shumkin @ 2013-08-05  8:45 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jakub Narebski, Alexander Gerasiov

On Sun, Aug 04, 2013 at 11:14:40AM -0700, Jonathan Nieder wrote:
> Alexey Shumkin wrote:
> > On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:
> 
> >>  1. Log messages use the configured log output encoding, which is
> >>     meant to be whatever encoding works best with local terminals
> >>     (and does not have much to do with what encoding should be used
> >>     for email)
> >>
> >>  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
> >>     port (which uses Unicode filesystem APIs), always UTF-8
> >
> > I cannot say exactly if it makes sense for THIS patch, but I'd like to
> > remind about Cygwin port, which definitely does not use UTF-8 encoding
> > (in my case it is Windows-1251) for filenames.
> >
> >> 
> >>  3. The "This is an automated email" preface uses a project description
> >>     from .git/description, which is typically in UTF-8 to support
> >>     gitweb.
> 
> Thanks for clarifying.  So in the context you describe, (1) is
> configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and
> there is no way with current git facilities to force the email to use
> a single encoding unless (3) happens to contain no special characters.
> 
> What is the value of the "[i18n] commitEncoding" setting in your
> project?
commitEncoding is equal to filenames' encoding, Windows-1251, of course.

> What encoding do the raw commit messages (shown with
> "git log --format=raw") use for their text, and what do they declare
> with an in-commit 'encoding' header, if any?
Well, despite `git log --help` 
--8<--
raw
           The raw format shows the entire commit exactly as stored in
           the commit object"
--8<--
on a Linux box (UTF-8) I can see "readable" commit messages nevertheless
they are stored in 'Windows-1251' (so they are converted to UTF-8). To
be sure I've checked actual content of them with `git cat-file commit`
Actually, to be honest, I usually use modified version of Git (see
ecaee8050cec23eb4cf082512e907e3e52c20b57) in 'next' branch, that could
affect the results, so I've checked `git log --format=raw` with
unmodified v1.8.3.3 of Git.

But let's go back to the answer to your question. Commit encoding stored
as a header in a raw commit messages is 'Windows-1251'.
> 
> Does everyone on this project use Cygwin?i
This is a "closed" (commercial) project and every developer uses Cygwin,
except me. I use a Linux box as a desktop (mail, IM, web-browsing; but
development goes on Cygwin). And sometimes I run utility scripts
included to that project on my desktop (as far as Linux works with files
much faster than Cygwin does ;))
Also, a Git server is a coLinux box (http://www.colinux.org/) on a
Windows Server 2003, but I guess, it does not much matter here.
>  That should be fine, but
> I'd expect there to be problems as soon as someone wants to try the
> Mingw port ("Git for Windows").
Yep, one of our developers tried to use modern version of TortoiseGit
with MinGW port of Git. That was a failure. As far as since v1.7.9 MinGW
port transcodes filenames to store them internally in UTF-8. This
problem could be solved with converting once that non-ASCII filenames to
UTF-8, but I do not want to use MinGW port. I like Cygwin
"infrastructure" that is more Linux-like than MinGW.
> 
> I wonder if there should be an "[i18n] repositoryPathEncoding"
> configuration item to support this kind of repository.  Then git could
> be aware of the intended encoding of paths, could recode them for
> display to a terminal, and at least on Linux and Mingw could recode
> them for use in filenames on disk.  "repositoryPathEncoding = none"
> would mean the current behavior of treating paths as raw sequences of
> bytes.
I'd be happy if such a setting exists. That could solve many problems
with cross-platform projects with non-ASCII filenames.
Indeed, MinGW port does resolve that problem somehow!
> 
> What do you think?
> Jonathan

-- 
Alexey Shumkin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-08-05  8:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02 23:21 [PATCH/RFC 0/3] post-receive-email: explicitly set Content-Type header Jonathan Nieder
2013-08-02 23:22 ` [PATCH 1/3] hooks/post-receive-email: use plumbing instead of git log/show Jonathan Nieder
2013-08-02 23:23 ` [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8 Jonathan Nieder
2013-08-04 14:54   ` Alexey Shumkin
2013-08-04 18:14     ` Jonathan Nieder
2013-08-05  8:45       ` Alexey Shumkin
2013-08-02 23:24 ` [PATCH 3/3] hooks/post-receive-email: set declared encoding to utf-8 Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).