git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-svn dcommits messages not in UTF-8 charset to mod_dav_svn?
@ 2009-05-27 16:16 Matthias Andree
  2009-05-28  8:08 ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Matthias Andree @ 2009-05-27 16:16 UTC (permalink / raw
  To: git, users

Greetings,

I had tried to use git cvsimport and git svn to transfer a CVS repository  
(I have access to it) to an SVN repository (where I don't have access to  
the repo, so I cannot use cvs2svn or similar).

The problem is that the CVS repo had non-UTF-8 commit log messages, and I  
didn't bother to convert them to UTF-8. However, SVN insists on encoding  
filenames and log messages in UTF-8.

"git svn dcommit" (which uses the SVN Perl bindings under the hood)  
happily committed such a non-UTF-8 message and br0ke the repo. The actual  
reason is that the SVN server (https://...) is now wedged, as in:

$ svn log -r130
svn: REPORT of '/repos/!svn/bc/130': 200 OK (https://svn-serv...de)

$ svn --xml log -r130  2>/dev/null
[stdout:]
<?xml version="1.0"?>
<log>
[stderr:]
svn: REPORT of '/repos/!svn/bc/130': 200 OK (https://svn-serv...de)

$ svn propget --revprop svn:log -r130 https://svn-serv...de/path/
aktuelle version (disclaimer)
kopf und fu?\223zeilen
etc.


While mod_dav_svn arguably shouldn't accept b0rked messages, git-svn  
shouldn't attempt to commit them either. It seems that the svn command  
line utilities validate the message format by themselves, and apparently  
the svn server module (likely mod_dav_svn - or are there others?) does not.

So, could
a) git-svn be modified to refuse dcommiting non-UTF-8 messages?
b) mod_dav_svn be modified to refuse commits/propedits/propsets with  
non-UTF-8 messages?

I'm sorry to say I don't have information how the SVN server is configured  
and which version it's running.

Thanks.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git-svn dcommits messages not in UTF-8 charset to mod_dav_svn?
  2009-05-27 16:16 git-svn dcommits messages not in UTF-8 charset to mod_dav_svn? Matthias Andree
@ 2009-05-28  8:08 ` Eric Wong
  2009-05-28  8:18   ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2009-05-28  8:08 UTC (permalink / raw
  To: Matthias Andree; +Cc: git, users

Matthias Andree <matthias.andree@gmx.de> wrote:
> Greetings,
>
> I had tried to use git cvsimport and git svn to transfer a CVS repository 
> (I have access to it) to an SVN repository (where I don't have access to  
> the repo, so I cannot use cvs2svn or similar).
>
> The problem is that the CVS repo had non-UTF-8 commit log messages, and I 
> didn't bother to convert them to UTF-8. However, SVN insists on encoding  
> filenames and log messages in UTF-8.
>
> "git svn dcommit" (which uses the SVN Perl bindings under the hood)  
> happily committed such a non-UTF-8 message and br0ke the repo. The actual 
> reason is that the SVN server (https://...) is now wedged, as in:

> While mod_dav_svn arguably shouldn't accept b0rked messages, git-svn  
> shouldn't attempt to commit them either. It seems that the svn command  
> line utilities validate the message format by themselves, and apparently  
> the svn server module (likely mod_dav_svn - or are there others?) does 
> not.

This was partially fixed in commit
16fc08e2d86dad152194829d21bc55b2ef0c8fb1.  You just need to manually
specify the i18n.commitencoding in your .git/config

> So, could
> a) git-svn be modified to refuse dcommiting non-UTF-8 messages?

On the way is a patch that makes git-svn refuse to dcommit messages
that are malformed UTF-8 and tell the user about i18n.commitencoding.

Thanks for reminding me

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
  2009-05-28  8:08 ` Eric Wong
@ 2009-05-28  8:18   ` Eric Wong
  2009-05-29  7:09     ` Junio C Hamano
  2009-05-29  7:56     ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Wong @ 2009-05-28  8:18 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, users, Matthias Andree

...without i18n.commitencoding set in the config.

SVN tries to store all commit messages in UTF-8, however it is
up to the job of the clients to enforce this rule.  SVN servers
themselves do not always enforce this; allowing clients to
commit malformed UTF-8 messages and break repositories.

So git-svn will enforce this and tell the user to set
i18n.commitencoding when a git commit is is not in UTF-8.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---

 Also pushed to git://git.bogomips.org/git-svn.git

 git-svn.perl                               |   17 ++++++++--
 t/t9139-git-svn-non-utf8-commitencoding.sh |   47 ++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)
 create mode 100755 t/t9139-git-svn-non-utf8-commitencoding.sh

diff --git a/git-svn.perl b/git-svn.perl
index a70c7d7..3301797 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1178,16 +1178,27 @@ sub get_commit_entry {
 	}
 	rename $commit_editmsg, $commit_msg or croak $!;
 	{
+		require Encode;
 		# SVN requires messages to be UTF-8 when entering the repo
 		local $/;
 		open $log_fh, '<', $commit_msg or croak $!;
 		binmode $log_fh;
 		chomp($log_entry{log} = <$log_fh>);
 
-		if (my $enc = Git::config('i18n.commitencoding')) {
-			require Encode;
-			Encode::from_to($log_entry{log}, $enc, 'UTF-8');
+		my $enc = Git::config('i18n.commitencoding') || 'UTF-8';
+		my $msg = $log_entry{log};
+
+		eval { $msg = Encode::decode($enc, $msg, 1) };
+		if ($@) {
+			die "Could not decode as $enc:\n", $msg,
+			    "\nPerhaps you need to set i18n.commitencoding\n";
 		}
+
+		eval { $msg = Encode::encode('UTF-8', $msg, 1) };
+		die "Could not encode as UTF-8:\n$msg\n" if $@;
+
+		$log_entry{log} = $msg;
+
 		close $log_fh or croak $!;
 	}
 	unlink $commit_msg;
diff --git a/t/t9139-git-svn-non-utf8-commitencoding.sh b/t/t9139-git-svn-non-utf8-commitencoding.sh
new file mode 100755
index 0000000..2b1db97
--- /dev/null
+++ b/t/t9139-git-svn-non-utf8-commitencoding.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+#
+# Copyright (c) 2009 Eric Wong
+
+test_description='git svn refuses to dcommit non-UTF8 messages'
+
+. ./lib-git-svn.sh
+
+# ISO-2022-JP can pass for valid UTF-8, so skipping that in this test
+
+for H in ISO-8859-1 EUCJP
+do
+	test_expect_success "$H setup" '
+		mkdir $H &&
+		svn_cmd import -m "$H test" $H "$svnrepo"/$H &&
+		git svn clone "$svnrepo"/$H $H
+	'
+done
+
+for H in ISO-8859-1 EUCJP
+do
+	test_expect_success "$H commit on git side" '
+	(
+		cd $H &&
+		git config i18n.commitencoding $H &&
+		git checkout -b t refs/remotes/git-svn &&
+		echo $H >F &&
+		git add F &&
+		git commit -a -F "$TEST_DIRECTORY"/t3900/$H.txt &&
+		E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") &&
+		test "z$E" = "z$H"
+	)
+	'
+done
+
+for H in ISO-8859-1 EUCJP
+do
+	test_expect_success "$H dcommit to svn" '
+	(
+		cd $H &&
+		git config --unset i18n.commitencoding &&
+		! git svn dcommit
+	)
+	'
+done
+
+test_done
-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
  2009-05-28  8:18   ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
@ 2009-05-29  7:09     ` Junio C Hamano
  2009-05-29  7:56     ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2009-05-29  7:09 UTC (permalink / raw
  To: Eric Wong; +Cc: git, users, Matthias Andree

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
  2009-05-28  8:18   ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
  2009-05-29  7:09     ` Junio C Hamano
@ 2009-05-29  7:56     ` Junio C Hamano
  2009-05-29 14:11       ` Brandon Casey
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2009-05-29  7:56 UTC (permalink / raw
  To: Eric Wong; +Cc: git, Brandon Casey

Eric Wong <normalperson@yhbt.net> writes:

>  t/t9139-git-svn-non-utf8-commitencoding.sh |   47 ++++++++++++++++++++++++++++

Hmm.

> +# Copyright (c) 2009 Eric Wong
> +
> +test_description='git svn refuses to dcommit non-UTF8 messages'
> +
> +. ./lib-git-svn.sh

This passes when merged to 'master', but together with bc/old-iconv branch
cooking in 'next' it breaks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
  2009-05-29  7:56     ` Junio C Hamano
@ 2009-05-29 14:11       ` Brandon Casey
  2009-05-30  0:14         ` [PATCH] t9139 uses ancient, backwards-compatible iconv names Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Brandon Casey @ 2009-05-29 14:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Eric Wong, git, Brandon Casey

Junio C Hamano wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
>>  t/t9139-git-svn-non-utf8-commitencoding.sh |   47 ++++++++++++++++++++++++++++
> 
> Hmm.
> 
>> +# Copyright (c) 2009 Eric Wong
>> +
>> +test_description='git svn refuses to dcommit non-UTF8 messages'
>> +
>> +. ./lib-git-svn.sh
> 
> This passes when merged to 'master', but together with bc/old-iconv branch
> cooking in 'next' it breaks.

Yeah, it's the second for loop which accesses the files in t/t3900/.
bc/old-iconv replaces each occurrence of ISO-8859-1 with ISO8859-1
and EUCJP with eucJP since old Solaris didn't know both names and
modern platforms handle either name.  The text files in t/t3900/
were renamed accordingly.

It would be nice to use these older names here too, even though I
won't be able to test it since svn is not installed on the older
platforms I have access to.

-brandon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] t9139 uses ancient, backwards-compatible iconv names
  2009-05-29 14:11       ` Brandon Casey
@ 2009-05-30  0:14         ` Eric Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2009-05-30  0:14 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Brandon Casey

This is needed to work with
5ae93dfdccfe9457bdb1f54b33c76359f6c3b861:
  t3900: use ancient iconv names for backward compatibility

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---

  Brandon Casey <casey@nrlssc.navy.mil> wrote:
  > Junio C Hamano wrote:
  > > Eric Wong <normalperson@yhbt.net> writes:
  > > 
  > >>  t/t9139-git-svn-non-utf8-commitencoding.sh |   47 ++++++++++++++++++++++++++++
  > > 
  > > Hmm.
  > > 
  > >> +# Copyright (c) 2009 Eric Wong
  > >> +
  > >> +test_description='git svn refuses to dcommit non-UTF8 messages'
  > >> +
  > >> +. ./lib-git-svn.sh
  > > 
  > > This passes when merged to 'master', but together with bc/old-iconv branch
  > > cooking in 'next' it breaks.
  > 
  > Yeah, it's the second for loop which accesses the files in t/t3900/.
  > bc/old-iconv replaces each occurrence of ISO-8859-1 with ISO8859-1
  > and EUCJP with eucJP since old Solaris didn't know both names and
  > modern platforms handle either name.  The text files in t/t3900/
  > were renamed accordingly.
  > 
  > It would be nice to use these older names here too, even though I
  > won't be able to test it since svn is not installed on the older
  > platforms I have access to.

 t/t9139-git-svn-non-utf8-commitencoding.sh |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t9139-git-svn-non-utf8-commitencoding.sh b/t/t9139-git-svn-non-utf8-commitencoding.sh
index 2b1db97..f337959 100755
--- a/t/t9139-git-svn-non-utf8-commitencoding.sh
+++ b/t/t9139-git-svn-non-utf8-commitencoding.sh
@@ -8,7 +8,7 @@ test_description='git svn refuses to dcommit non-UTF8 messages'
 
 # ISO-2022-JP can pass for valid UTF-8, so skipping that in this test
 
-for H in ISO-8859-1 EUCJP
+for H in ISO8859-1 eucJP
 do
 	test_expect_success "$H setup" '
 		mkdir $H &&
@@ -17,7 +17,7 @@ do
 	'
 done
 
-for H in ISO-8859-1 EUCJP
+for H in ISO8859-1 eucJP
 do
 	test_expect_success "$H commit on git side" '
 	(
@@ -33,7 +33,7 @@ do
 	'
 done
 
-for H in ISO-8859-1 EUCJP
+for H in ISO8859-1 eucJP
 do
 	test_expect_success "$H dcommit to svn" '
 	(
-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-30  0:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-27 16:16 git-svn dcommits messages not in UTF-8 charset to mod_dav_svn? Matthias Andree
2009-05-28  8:08 ` Eric Wong
2009-05-28  8:18   ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
2009-05-29  7:09     ` Junio C Hamano
2009-05-29  7:56     ` Junio C Hamano
2009-05-29 14:11       ` Brandon Casey
2009-05-30  0:14         ` [PATCH] t9139 uses ancient, backwards-compatible iconv names Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).