* [PATCH v4] git-p4: add config git-p4.pathEncoding
@ 2015-09-02 14:57 larsxschneider
2015-09-02 14:57 ` larsxschneider
0 siblings, 1 reply; 4+ messages in thread
From: larsxschneider @ 2015-09-02 14:57 UTC (permalink / raw
To: git; +Cc: luke, gitster, tboegi, Lars Schneider
From: Lars Schneider <larsxschneider@gmail.com>
Diff to v3:
* add proper commit message
* remove command line option "--path-encoding" and add config "git-p4.pathEncoding"
* change TC number to 9822
* escape UTF-8 characters in TC
* change test encoding used in TC from cp1251 to ISO-8859-1
* use static ISO-8859-1 encoded string in TC
* check content of test file used in TC
* shorten core.quotepath usage
Thanks to Torsten and Junio for feedback!
Cheers,
Lars
Lars Schneider (1):
git-p4: add config git-p4.pathEncoding
Documentation/git-p4.txt | 7 +++++
git-p4.py | 3 ++
t/t9822-git-p4-path-encoding.sh | 65 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+)
create mode 100755 t/t9822-git-p4-path-encoding.sh
--
1.9.5 (Apple Git-50.3)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v4] git-p4: add config git-p4.pathEncoding
2015-09-02 14:57 [PATCH v4] git-p4: add config git-p4.pathEncoding larsxschneider
@ 2015-09-02 14:57 ` larsxschneider
2015-09-02 16:30 ` Junio C Hamano
2015-09-02 18:13 ` Eric Sunshine
0 siblings, 2 replies; 4+ messages in thread
From: larsxschneider @ 2015-09-02 14:57 UTC (permalink / raw
To: git; +Cc: luke, gitster, tboegi, Lars Schneider
From: Lars Schneider <larsxschneider@gmail.com>
Perforce keeps the encoding of a path as given by the originating OS.
Git expects paths encoded as UTF-8. Add a config to tell git-p4 what
encoding Perforce had used for the paths. This encoding is used to
transcode the paths to UTF-8. As an example, Perforce on my Windows
box uses “cp1252” to encode path names.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
Documentation/git-p4.txt | 7 +++++
git-p4.py | 3 ++
t/t9822-git-p4-path-encoding.sh | 65 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+)
create mode 100755 t/t9822-git-p4-path-encoding.sh
diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 82aa5d6..bf3adf9 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -510,6 +510,13 @@ git-p4.useClientSpec::
option '--use-client-spec'. See the "CLIENT SPEC" section above.
This variable is a boolean, not the name of a p4 client.
+git-p4.pathEncoding::
+ Perforce keeps the encoding of a path as given by the originating OS.
+ Git expects paths encoded as UTF-8. Use this config to tell git-p4
+ what encoding Perforce had used for the paths. This encoding is used
+ to transcode the paths to UTF-8. As an example, Perforce on my Windows
+ box uses “cp1252” to encode path names.
+
Submit variables
~~~~~~~~~~~~~~~~
git-p4.detectRenames::
diff --git a/git-p4.py b/git-p4.py
index 073f87b..706fcdc 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2213,6 +2213,9 @@ class P4Sync(Command, P4UserMap):
text = regexp.sub(r'$\1$', text)
contents = [ text ]
+ if gitConfig("git-p4.pathEncoding"):
+ relPath = relPath.decode(gitConfig("git-p4.pathEncoding")).encode('utf8', 'replace')
+
self.gitStream.write("M %s inline %s\n" % (git_mode, relPath))
# total length...
diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
new file mode 100755
index 0000000..3a1779a
--- /dev/null
+++ b/t/t9822-git-p4-path-encoding.sh
@@ -0,0 +1,65 @@
+#!/bin/sh
+
+test_description='Clone repositories with non ASCII paths'
+
+. ./lib-git-p4.sh
+
+UTF8_ESCAPED="a-\303\244_o-\303\266_u-\303\274.txt"
+ISO8859_ESCAPED="\141\55\344\137\157\55\366\137\165\55\374\56\164\170\164"
+
+# You can generate the ISO8859_ESCAPED with the following command:
+# printf "$UTF8_ESCAPED" | \
+# iconv -f utf-8 -t iso8859-1 | \
+# xxd -ps -u -c 1 | xargs bash -c 'for v; do echo "ibase=16; obase=8; $v" | bc; done' bash | \
+# tr "\n" "\\"
+
+test_expect_success 'start p4d' '
+ start_p4d
+'
+
+test_expect_success 'Create a repo containing iso8859-1 encoded paths' '
+ cd "$cli" &&
+
+ ISO8859="$(printf "$ISO8859_ESCAPED")" &&
+ echo content123 >"$ISO8859" &&
+ p4 add "$ISO8859" &&
+ p4 submit -d "test commit"
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded paths without git-p4.pathEncoding' '
+ git p4 clone --destination="$git" //depot &&
+ test_when_finished cleanup_git &&
+ (
+ cd "$git" &&
+ UTF8="$(printf "$UTF8_ESCAPED")" &&
+ echo $UTF8 >expect &&
+ git -c core.quotepath=false ls-files >actual &&
+ test_must_fail test_cmp expect actual
+ )
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.pathEncoding' '
+
+ test_when_finished cleanup_git &&
+ (
+ cd "$git" &&
+ git init . &&
+ git config git-p4.pathEncoding iso8859-1 &&
+ git p4 clone --use-client-spec --destination="$git" //depot &&
+ UTF8="$(printf "$UTF8_ESCAPED")" &&
+ echo $UTF8 >expect &&
+ git -c core.quotepath=false ls-files >actual &&
+ test_cmp expect actual &&
+ cat >expect <<-\EOF &&
+ content123
+ EOF
+ cat $UTF8 >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'kill p4d' '
+ kill_p4d
+'
+
+test_done
--
1.9.5 (Apple Git-50.3)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v4] git-p4: add config git-p4.pathEncoding
2015-09-02 14:57 ` larsxschneider
@ 2015-09-02 16:30 ` Junio C Hamano
2015-09-02 18:13 ` Eric Sunshine
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2015-09-02 16:30 UTC (permalink / raw
To: larsxschneider; +Cc: git, luke, tboegi
larsxschneider@gmail.com writes:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Perforce keeps the encoding of a path as given by the originating OS.
> Git expects paths encoded as UTF-8. Add a config to tell git-p4 what
> encoding Perforce had used for the paths. This encoding is used to
> transcode the paths to UTF-8. As an example, Perforce on my Windows
> box uses “cp1252” to encode path names.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
Thanks.
> +git-p4.pathEncoding::
> + Perforce keeps the encoding of a path as given by the originating OS.
> + Git expects paths encoded as UTF-8. Use this config to tell git-p4
> + what encoding Perforce had used for the paths. This encoding is used
> + to transcode the paths to UTF-8. As an example, Perforce on my Windows
> + box uses “cp1252” to encode path names.
> +
The log message is sort of personal statement, but "my Windows"
smells out of place here, as there is no clear writer in the
documentation. Perhaps rephrase it to:
For example, Perforce often uses "cp1252" to encode path names
on Windows box.
or something?
> diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
> new file mode 100755
> index 0000000..3a1779a
> --- /dev/null
> +++ b/t/t9822-git-p4-path-encoding.sh
> @@ -0,0 +1,65 @@
> +#!/bin/sh
> +
> +test_description='Clone repositories with non ASCII paths'
> +
> +. ./lib-git-p4.sh
> +
> +UTF8_ESCAPED="a-\303\244_o-\303\266_u-\303\274.txt"
> +ISO8859_ESCAPED="\141\55\344\137\157\55\366\137\165\55\374\56\164\170\164"
> +
> +# You can generate the ISO8859_ESCAPED with the following command:
Please don't. You don't want to encode ".txt" and other things that
are ASCII, which you didn't encode in the original UTF8_ESCAPED.
> +# printf "$UTF8_ESCAPED" | \
> +# iconv -f utf-8 -t iso8859-1 | \
> +# xxd -ps -u -c 1 | xargs bash -c 'for v; do echo "ibase=16; obase=8; $v" | bc; done' bash | \
> +# tr "\n" "\\"
Besides, you somehow came up with UTF8_ESCAPED with a procedure that
is not documented (and it does avoid hiding obvious things like
".txt" in the backslashes). I do not think it is necessary or even
a good idea to give a procedure that is not very portable (xxd?
bash?) and does not produce what we want to see, only for ISO8859.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4] git-p4: add config git-p4.pathEncoding
2015-09-02 14:57 ` larsxschneider
2015-09-02 16:30 ` Junio C Hamano
@ 2015-09-02 18:13 ` Eric Sunshine
1 sibling, 0 replies; 4+ messages in thread
From: Eric Sunshine @ 2015-09-02 18:13 UTC (permalink / raw
To: larsxschneider@gmail.com
Cc: git@vger.kernel.org, luke@diamand.org, gitster@pobox.com,
tboegi@web.de
On Wednesday, September 2, 2015, <larsxschneider@gmail.com> wrote:
> Perforce keeps the encoding of a path as given by the originating OS.
> Git expects paths encoded as UTF-8. Add a config to tell git-p4 what
> encoding Perforce had used for the paths. This encoding is used to
> transcode the paths to UTF-8. As an example, Perforce on my Windows
> box uses “cp1252” to encode path names.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
> +test_expect_success 'start p4d' '
> + start_p4d
> +'
> +
> +test_expect_success 'Create a repo containing iso8859-1 encoded paths' '
> + cd "$cli" &&
Torsten, I think, mentioned previously that the 'cd' and code
following should be wrapped in a subshell.
> + ISO8859="$(printf "$ISO8859_ESCAPED")" &&
> + echo content123 >"$ISO8859" &&
> + p4 add "$ISO8859" &&
> + p4 submit -d "test commit"
And, it's odd that this test doesn't "cd ..", which means that
subsequent tests are running in directory $cli. Is that intentional?
If so, it probably ought to be done in a more explicit and clear
fashion, perhaps by having each test do "cd $cli/$git" rather than
just "cd $git".
> +'
> +
> +test_expect_success 'Clone repo containing iso8859-1 encoded paths without git-p4.pathEncoding' '
> + git p4 clone --destination="$git" //depot &&
> + test_when_finished cleanup_git &&
> + (
> + cd "$git" &&
> + UTF8="$(printf "$UTF8_ESCAPED")" &&
> + echo $UTF8 >expect &&
> + git -c core.quotepath=false ls-files >actual &&
> + test_must_fail test_cmp expect actual
> + )
> +'
> +
> +test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.pathEncoding' '
> +
> + test_when_finished cleanup_git &&
> + (
> + cd "$git" &&
> + git init . &&
> + git config git-p4.pathEncoding iso8859-1 &&
> + git p4 clone --use-client-spec --destination="$git" //depot &&
> + UTF8="$(printf "$UTF8_ESCAPED")" &&
> + echo $UTF8 >expect &&
> + git -c core.quotepath=false ls-files >actual &&
> + test_cmp expect actual &&
> + cat >expect <<-\EOF &&
> + content123
> + EOF
> + cat $UTF8 >actual &&
> + test_cmp expect actual
> + )
> +'
> +
> +test_expect_success 'kill p4d' '
> + kill_p4d
> +'
> +
> +test_done
> --
> 1.9.5 (Apple Git-50.3)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-09-02 18:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-02 14:57 [PATCH v4] git-p4: add config git-p4.pathEncoding larsxschneider
2015-09-02 14:57 ` larsxschneider
2015-09-02 16:30 ` Junio C Hamano
2015-09-02 18:13 ` Eric Sunshine
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).