* git-svn dcommits messages not in UTF-8 charset to mod_dav_svn?
@ 2009-05-27 16:16 Matthias Andree
2009-05-28 8:08 ` Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Matthias Andree @ 2009-05-27 16:16 UTC (permalink / raw
To: git, users
Greetings,
I had tried to use git cvsimport and git svn to transfer a CVS repository
(I have access to it) to an SVN repository (where I don't have access to
the repo, so I cannot use cvs2svn or similar).
The problem is that the CVS repo had non-UTF-8 commit log messages, and I
didn't bother to convert them to UTF-8. However, SVN insists on encoding
filenames and log messages in UTF-8.
"git svn dcommit" (which uses the SVN Perl bindings under the hood)
happily committed such a non-UTF-8 message and br0ke the repo. The actual
reason is that the SVN server (https://...) is now wedged, as in:
$ svn log -r130
svn: REPORT of '/repos/!svn/bc/130': 200 OK (https://svn-serv...de)
$ svn --xml log -r130 2>/dev/null
[stdout:]
<?xml version="1.0"?>
<log>
[stderr:]
svn: REPORT of '/repos/!svn/bc/130': 200 OK (https://svn-serv...de)
$ svn propget --revprop svn:log -r130 https://svn-serv...de/path/
aktuelle version (disclaimer)
kopf und fu?\223zeilen
etc.
While mod_dav_svn arguably shouldn't accept b0rked messages, git-svn
shouldn't attempt to commit them either. It seems that the svn command
line utilities validate the message format by themselves, and apparently
the svn server module (likely mod_dav_svn - or are there others?) does not.
So, could
a) git-svn be modified to refuse dcommiting non-UTF-8 messages?
b) mod_dav_svn be modified to refuse commits/propedits/propsets with
non-UTF-8 messages?
I'm sorry to say I don't have information how the SVN server is configured
and which version it's running.
Thanks.
--
Matthias Andree
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git-svn dcommits messages not in UTF-8 charset to mod_dav_svn?
2009-05-27 16:16 git-svn dcommits messages not in UTF-8 charset to mod_dav_svn? Matthias Andree
@ 2009-05-28 8:08 ` Eric Wong
2009-05-28 8:18 ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2009-05-28 8:08 UTC (permalink / raw
To: Matthias Andree; +Cc: git, users
Matthias Andree <matthias.andree@gmx.de> wrote:
> Greetings,
>
> I had tried to use git cvsimport and git svn to transfer a CVS repository
> (I have access to it) to an SVN repository (where I don't have access to
> the repo, so I cannot use cvs2svn or similar).
>
> The problem is that the CVS repo had non-UTF-8 commit log messages, and I
> didn't bother to convert them to UTF-8. However, SVN insists on encoding
> filenames and log messages in UTF-8.
>
> "git svn dcommit" (which uses the SVN Perl bindings under the hood)
> happily committed such a non-UTF-8 message and br0ke the repo. The actual
> reason is that the SVN server (https://...) is now wedged, as in:
> While mod_dav_svn arguably shouldn't accept b0rked messages, git-svn
> shouldn't attempt to commit them either. It seems that the svn command
> line utilities validate the message format by themselves, and apparently
> the svn server module (likely mod_dav_svn - or are there others?) does
> not.
This was partially fixed in commit
16fc08e2d86dad152194829d21bc55b2ef0c8fb1. You just need to manually
specify the i18n.commitencoding in your .git/config
> So, could
> a) git-svn be modified to refuse dcommiting non-UTF-8 messages?
On the way is a patch that makes git-svn refuse to dcommit messages
that are malformed UTF-8 and tell the user about i18n.commitencoding.
Thanks for reminding me
--
Eric Wong
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
2009-05-28 8:08 ` Eric Wong
@ 2009-05-28 8:18 ` Eric Wong
2009-05-29 7:09 ` Junio C Hamano
2009-05-29 7:56 ` Junio C Hamano
0 siblings, 2 replies; 7+ messages in thread
From: Eric Wong @ 2009-05-28 8:18 UTC (permalink / raw
To: Junio C Hamano; +Cc: git, users, Matthias Andree
...without i18n.commitencoding set in the config.
SVN tries to store all commit messages in UTF-8, however it is
up to the job of the clients to enforce this rule. SVN servers
themselves do not always enforce this; allowing clients to
commit malformed UTF-8 messages and break repositories.
So git-svn will enforce this and tell the user to set
i18n.commitencoding when a git commit is is not in UTF-8.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
Also pushed to git://git.bogomips.org/git-svn.git
git-svn.perl | 17 ++++++++--
t/t9139-git-svn-non-utf8-commitencoding.sh | 47 ++++++++++++++++++++++++++++
2 files changed, 61 insertions(+), 3 deletions(-)
create mode 100755 t/t9139-git-svn-non-utf8-commitencoding.sh
diff --git a/git-svn.perl b/git-svn.perl
index a70c7d7..3301797 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1178,16 +1178,27 @@ sub get_commit_entry {
}
rename $commit_editmsg, $commit_msg or croak $!;
{
+ require Encode;
# SVN requires messages to be UTF-8 when entering the repo
local $/;
open $log_fh, '<', $commit_msg or croak $!;
binmode $log_fh;
chomp($log_entry{log} = <$log_fh>);
- if (my $enc = Git::config('i18n.commitencoding')) {
- require Encode;
- Encode::from_to($log_entry{log}, $enc, 'UTF-8');
+ my $enc = Git::config('i18n.commitencoding') || 'UTF-8';
+ my $msg = $log_entry{log};
+
+ eval { $msg = Encode::decode($enc, $msg, 1) };
+ if ($@) {
+ die "Could not decode as $enc:\n", $msg,
+ "\nPerhaps you need to set i18n.commitencoding\n";
}
+
+ eval { $msg = Encode::encode('UTF-8', $msg, 1) };
+ die "Could not encode as UTF-8:\n$msg\n" if $@;
+
+ $log_entry{log} = $msg;
+
close $log_fh or croak $!;
}
unlink $commit_msg;
diff --git a/t/t9139-git-svn-non-utf8-commitencoding.sh b/t/t9139-git-svn-non-utf8-commitencoding.sh
new file mode 100755
index 0000000..2b1db97
--- /dev/null
+++ b/t/t9139-git-svn-non-utf8-commitencoding.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+#
+# Copyright (c) 2009 Eric Wong
+
+test_description='git svn refuses to dcommit non-UTF8 messages'
+
+. ./lib-git-svn.sh
+
+# ISO-2022-JP can pass for valid UTF-8, so skipping that in this test
+
+for H in ISO-8859-1 EUCJP
+do
+ test_expect_success "$H setup" '
+ mkdir $H &&
+ svn_cmd import -m "$H test" $H "$svnrepo"/$H &&
+ git svn clone "$svnrepo"/$H $H
+ '
+done
+
+for H in ISO-8859-1 EUCJP
+do
+ test_expect_success "$H commit on git side" '
+ (
+ cd $H &&
+ git config i18n.commitencoding $H &&
+ git checkout -b t refs/remotes/git-svn &&
+ echo $H >F &&
+ git add F &&
+ git commit -a -F "$TEST_DIRECTORY"/t3900/$H.txt &&
+ E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") &&
+ test "z$E" = "z$H"
+ )
+ '
+done
+
+for H in ISO-8859-1 EUCJP
+do
+ test_expect_success "$H dcommit to svn" '
+ (
+ cd $H &&
+ git config --unset i18n.commitencoding &&
+ ! git svn dcommit
+ )
+ '
+done
+
+test_done
--
Eric Wong
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
2009-05-28 8:18 ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
@ 2009-05-29 7:09 ` Junio C Hamano
2009-05-29 7:56 ` Junio C Hamano
1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2009-05-29 7:09 UTC (permalink / raw
To: Eric Wong; +Cc: git, users, Matthias Andree
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
2009-05-28 8:18 ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
2009-05-29 7:09 ` Junio C Hamano
@ 2009-05-29 7:56 ` Junio C Hamano
2009-05-29 14:11 ` Brandon Casey
1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2009-05-29 7:56 UTC (permalink / raw
To: Eric Wong; +Cc: git, Brandon Casey
Eric Wong <normalperson@yhbt.net> writes:
> t/t9139-git-svn-non-utf8-commitencoding.sh | 47 ++++++++++++++++++++++++++++
Hmm.
> +# Copyright (c) 2009 Eric Wong
> +
> +test_description='git svn refuses to dcommit non-UTF8 messages'
> +
> +. ./lib-git-svn.sh
This passes when merged to 'master', but together with bc/old-iconv branch
cooking in 'next' it breaks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] git-svn: refuse to dcommit non-UTF-8 messages
2009-05-29 7:56 ` Junio C Hamano
@ 2009-05-29 14:11 ` Brandon Casey
2009-05-30 0:14 ` [PATCH] t9139 uses ancient, backwards-compatible iconv names Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Brandon Casey @ 2009-05-29 14:11 UTC (permalink / raw
To: Junio C Hamano; +Cc: Eric Wong, git, Brandon Casey
Junio C Hamano wrote:
> Eric Wong <normalperson@yhbt.net> writes:
>
>> t/t9139-git-svn-non-utf8-commitencoding.sh | 47 ++++++++++++++++++++++++++++
>
> Hmm.
>
>> +# Copyright (c) 2009 Eric Wong
>> +
>> +test_description='git svn refuses to dcommit non-UTF8 messages'
>> +
>> +. ./lib-git-svn.sh
>
> This passes when merged to 'master', but together with bc/old-iconv branch
> cooking in 'next' it breaks.
Yeah, it's the second for loop which accesses the files in t/t3900/.
bc/old-iconv replaces each occurrence of ISO-8859-1 with ISO8859-1
and EUCJP with eucJP since old Solaris didn't know both names and
modern platforms handle either name. The text files in t/t3900/
were renamed accordingly.
It would be nice to use these older names here too, even though I
won't be able to test it since svn is not installed on the older
platforms I have access to.
-brandon
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] t9139 uses ancient, backwards-compatible iconv names
2009-05-29 14:11 ` Brandon Casey
@ 2009-05-30 0:14 ` Eric Wong
0 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2009-05-30 0:14 UTC (permalink / raw
To: Junio C Hamano; +Cc: git, Brandon Casey
This is needed to work with
5ae93dfdccfe9457bdb1f54b33c76359f6c3b861:
t3900: use ancient iconv names for backward compatibility
Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
Brandon Casey <casey@nrlssc.navy.mil> wrote:
> Junio C Hamano wrote:
> > Eric Wong <normalperson@yhbt.net> writes:
> >
> >> t/t9139-git-svn-non-utf8-commitencoding.sh | 47 ++++++++++++++++++++++++++++
> >
> > Hmm.
> >
> >> +# Copyright (c) 2009 Eric Wong
> >> +
> >> +test_description='git svn refuses to dcommit non-UTF8 messages'
> >> +
> >> +. ./lib-git-svn.sh
> >
> > This passes when merged to 'master', but together with bc/old-iconv branch
> > cooking in 'next' it breaks.
>
> Yeah, it's the second for loop which accesses the files in t/t3900/.
> bc/old-iconv replaces each occurrence of ISO-8859-1 with ISO8859-1
> and EUCJP with eucJP since old Solaris didn't know both names and
> modern platforms handle either name. The text files in t/t3900/
> were renamed accordingly.
>
> It would be nice to use these older names here too, even though I
> won't be able to test it since svn is not installed on the older
> platforms I have access to.
t/t9139-git-svn-non-utf8-commitencoding.sh | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/t/t9139-git-svn-non-utf8-commitencoding.sh b/t/t9139-git-svn-non-utf8-commitencoding.sh
index 2b1db97..f337959 100755
--- a/t/t9139-git-svn-non-utf8-commitencoding.sh
+++ b/t/t9139-git-svn-non-utf8-commitencoding.sh
@@ -8,7 +8,7 @@ test_description='git svn refuses to dcommit non-UTF8 messages'
# ISO-2022-JP can pass for valid UTF-8, so skipping that in this test
-for H in ISO-8859-1 EUCJP
+for H in ISO8859-1 eucJP
do
test_expect_success "$H setup" '
mkdir $H &&
@@ -17,7 +17,7 @@ do
'
done
-for H in ISO-8859-1 EUCJP
+for H in ISO8859-1 eucJP
do
test_expect_success "$H commit on git side" '
(
@@ -33,7 +33,7 @@ do
'
done
-for H in ISO-8859-1 EUCJP
+for H in ISO8859-1 eucJP
do
test_expect_success "$H dcommit to svn" '
(
--
Eric Wong
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-05-30 0:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-27 16:16 git-svn dcommits messages not in UTF-8 charset to mod_dav_svn? Matthias Andree
2009-05-28 8:08 ` Eric Wong
2009-05-28 8:18 ` [PATCH] git-svn: refuse to dcommit non-UTF-8 messages Eric Wong
2009-05-29 7:09 ` Junio C Hamano
2009-05-29 7:56 ` Junio C Hamano
2009-05-29 14:11 ` Brandon Casey
2009-05-30 0:14 ` [PATCH] t9139 uses ancient, backwards-compatible iconv names Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).