git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Git List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Luke Diamand <luke@diamand.org>, Pete Wyckoff <pw@padd.com>
Cc: Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com>
Subject: [PATCH v6] Add git-p4.fallbackEncoding
Date: Thu, 29 Apr 2021 00:39:05 -0700	[thread overview]
Message-ID: <20210429073905.837-1-tzadik.vanderhoof@gmail.com> (raw)
In-Reply-To: <20210428145824.43c4t7hkjfqjyspb@tb-raspi4>

Add git-p4.fallbackEncoding config variable, to prevent git-p4 from
crashing on non UTF-8 changeset descriptions.

When git-p4 reads the output from a p4 command, it assumes it will
be 100% UTF-8. If even one character in the output of one p4 command is
not UTF-8, git-p4 crashes with:

    File "C:/Program Files/Git/bin/git-p4.py", line 774, in p4CmdList
        value = value.decode() UnicodeDecodeError: 'utf-8' codec can't
        decode byte Ox93 in position 42: invalid start byte

This is especially a problem for the "git p4 clone ... @all" command,
where git-p4 needs to read thousands of changeset descriptions, one of
which may have a stray smart quote, causing the whole clone operation
to fail.

Add a new config setting, allowing git-p4 to try a fallback encoding
(for example, "cp1252") and/or use the Unicode replacement character,
to prevent the whole program from crashing on such a minor problem.

Signed-off-by: Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com>
---
 Documentation/git-p4.txt                   |  9 ++
 git-p4.py                                  | 11 ++-
 t/t9836-git-p4-config-fallback-encoding.sh | 98 ++++++++++++++++++++++
 3 files changed, 117 insertions(+), 1 deletion(-)
 create mode 100755 t/t9836-git-p4-config-fallback-encoding.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index f89e68b424..86d3ffa644 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -638,6 +638,15 @@ git-p4.pathEncoding::
 	to transcode the paths to UTF-8. As an example, Perforce on Windows
 	often uses "cp1252" to encode path names.
 
+git-p4.fallbackEncoding::
+	Perforce changeset descriptions can be stored in any encoding.
+	Git-p4 first tries to interpret each description as UTF-8. If that
+	fails, this config allows another encoding to be tried. You can specify,
+	for example, "cp1252". If git-p4.fallbackEncoding is "replace", UTF-8 will
+	be used, with invalid UTF-8 characters replaced by the Unicode replacement
+	character. The default is "none": there is no fallback, and any non UTF-8
+	character will cause git-p4 to immediately fail.
+
 git-p4.largeFileSystem::
 	Specify the system that is used for large (binary) files. Please note
 	that large file systems do not support the 'git p4 submit' command.
diff --git a/git-p4.py b/git-p4.py
index 09c9e93ac4..202fb01bdf 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -771,7 +771,16 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
                 for key, value in entry.items():
                     key = key.decode()
                     if isinstance(value, bytes) and not (key in ('data', 'path', 'clientFile') or key.startswith('depotFile')):
-                        value = value.decode()
+                        try:
+                            value = value.decode()
+                        except UnicodeDecodeError:
+                            fallbackEncoding = gitConfig("git-p4.fallbackEncoding").lower() or 'none'
+                            if fallbackEncoding == 'none':
+                                raise Exception("UTF-8 decoding failed. Consider using git config git-p4.fallbackEncoding")
+                            elif fallbackEncoding == 'replace':
+                                value = value.decode(errors='replace')
+                            else:
+                                value = value.decode(encoding=fallbackEncoding)
                     decoded_entry[key] = value
                 # Parse out data if it's an error response
                 if decoded_entry.get('code') == 'error' and 'data' in decoded_entry:
diff --git a/t/t9836-git-p4-config-fallback-encoding.sh b/t/t9836-git-p4-config-fallback-encoding.sh
new file mode 100755
index 0000000000..901bb3759d
--- /dev/null
+++ b/t/t9836-git-p4-config-fallback-encoding.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+
+test_description='test git-p4.fallbackEncoding config'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./lib-git-p4.sh
+
+test_expect_success 'start p4d' '
+	start_p4d
+'
+
+test_expect_success 'add Unicode description' '
+	cd "$cli" &&
+	echo file1 >file1 &&
+	p4 add file1 &&
+	p4 submit -d documentación
+'
+
+# Unicode descriptions cause "git p4 clone" to crash with a UnicodeDecodeError in some
+# environments. This test determines if that is the case in our environment. If so,
+# we create a file called "clone_fails". In subsequent tests, we check whether that
+# file exists to determine what behavior to expect.
+
+clone_fails="$TRASH_DIRECTORY/clone_fails"
+
+# If clone fails with git-p4.fallbackEncoding set to "none", create the "clone_fails" file,
+# and make sure the error message is correct
+
+test_expect_success 'clone with git-p4.fallbackEncoding set to "none"' '
+	git config --global git-p4.fallbackEncoding none &&
+	test_when_finished cleanup_git && {
+		git p4 clone --dest="$git" //depot@all 2>error || (
+			>"$clone_fails" &&
+			grep "UTF-8 decoding failed. Consider using git config git-p4.fallbackEncoding" error
+		)
+	}
+'
+
+# If clone fails with git-p4.fallbackEncoding set to "none", it should also fail when it's unset,
+# also with the correct error message.  Otherwise the clone should succeed.
+
+test_expect_success 'clone with git-p4.fallbackEncoding unset' '
+	git config --global --unset git-p4.fallbackEncoding &&
+	test_when_finished cleanup_git && {
+		(
+			test -f "$clone_fails" &&
+			test_must_fail git p4 clone --dest="$git" //depot@all 2>error &&
+			grep "UTF-8 decoding failed. Consider using git config git-p4.fallbackEncoding" error
+		) ||
+		(
+			! test -f "$clone_fails" &&
+			git p4 clone --dest="$git" //depot@all 2>error
+		)
+	}
+'
+
+# Whether or not "clone_fails" exists, setting git-p4.fallbackEncoding
+# to "cp1252" should cause clone to succeed and get the right description
+
+test_expect_success 'clone with git-p4.fallbackEncoding set to "cp1252"' '
+	git config --global git-p4.fallbackEncoding cp1252 &&
+	test_when_finished cleanup_git &&
+	(
+		git p4 clone --dest="$git" //depot@all &&
+		cd "$git" &&
+		git log --oneline >log &&
+		desc=$(head -1 log | cut -d" " -f2) &&
+		test "$desc" = "documentación"
+	)
+'
+
+# Setting git-p4.fallbackEncoding to "replace" should always cause clone to succeed.
+# If "clone_fails" exists, the description should contain the Unicode replacement
+# character, otherwise the description should be correct (since we're on a system that
+# doesn't have the Unicode issue)
+
+test_expect_success 'clone with git-p4.fallbackEncoding set to "replace"' '
+	git config --global git-p4.fallbackEncoding replace &&
+	test_when_finished cleanup_git &&
+	(
+		git p4 clone --dest="$git" //depot@all &&
+		cd "$git" &&
+		git log --oneline >log &&
+		desc=$(head -1 log | cut -d" " -f2) &&
+		{
+			(test -f "$clone_fails" &&
+				test "$desc" = "documentaci�n"
+			) ||
+			(! test -f "$clone_fails" &&
+				test "$desc" = "documentación"
+			)
+		}
+	)
+'
+
+test_done
-- 
2.31.1


  reply	other threads:[~2021-04-29  7:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 19:28 git-p4 crashes on non UTF-8 output from p4 Tzadik Vanderhoof
2021-04-09 15:38 ` Torsten Bögershausen
2021-04-11  7:16   ` Tzadik Vanderhoof
2021-04-11  9:37     ` Torsten Bögershausen
2021-04-11 20:21       ` Tzadik Vanderhoof
2021-04-12  4:06         ` Torsten Bögershausen
2021-04-21  8:46           ` [PATCH] add git-p4.fallbackEncoding config variable, to prevent git-p4 from crashing on non UTF-8 changeset descriptions Tzadik Vanderhoof
2021-04-21  8:55             ` Tzadik Vanderhoof
2021-04-22  5:05               ` [PATCH v3] add git-p4.fallbackEncoding config setting, " Tzadik Vanderhoof
2021-04-22 15:50                 ` Torsten Bögershausen
2021-04-22 16:17                   ` Eric Sunshine
2021-04-22 22:33                     ` Eric Sunshine
2021-04-23  6:36                       ` [PATCH] add git-p4.fallbackEncoding config variable, " Tzadik Vanderhoof
2021-04-23  6:44                         ` Tzadik Vanderhoof
2021-04-23 19:08                           ` Tzadik Vanderhoof
2021-04-24  8:14                             ` Torsten Bögershausen
2021-04-27  5:39                               ` [PATCH v5] " Tzadik Vanderhoof
2021-04-27  5:45                                 ` Tzadik Vanderhoof
2021-04-28  4:39                                 ` Junio C Hamano
2021-04-28 14:58                                   ` Torsten Bögershausen
2021-04-29  7:39                                     ` Tzadik Vanderhoof [this message]
2021-04-29  8:36                                       ` [PATCH v6] Add git-p4.fallbackEncoding Luke Diamand
2021-04-29 17:29                                         ` Tzadik Vanderhoof
     [not found]                                     ` <20210429074458.891-1-tzadik.vanderhoof@gmail.com>
     [not found]                                       ` <c4c48615-d1f4-fd37-0960-979535907f15@web.de>
2021-04-29 17:14                                         ` Tzadik Vanderhoof

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210429073905.837-1-tzadik.vanderhoof@gmail.com \
    --to=tzadik.vanderhoof@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=luke@diamand.org \
    --cc=pw@padd.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).