From: "Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Tao Klerks <tao@klerks.biz>, Kilian Kilger <kkilger@gmail.com>,
Kilian Kilger <kkilger@gmail.com>
Subject: [PATCH v2] git-p4: fix bug with encoding of p4 client name
Date: Mon, 18 Jul 2022 08:57:59 +0000 [thread overview]
Message-ID: <pull.1285.v2.git.git.1658134679233.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1285.git.git.1657267260405.gitgitgadget@gmail.com>
From: Kilian Kilger <kkilger@gmail.com>
The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
git-p4: Fix bug with encoding of P4 client name
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v2
Pull-Request: https://github.com/git/git/pull/1285
Range-diff vs v1:
1: 7393b59c642 ! 1: 3280a9579bc git-p4: fix bug with encoding of p4 client name
@@
## Metadata ##
-Author: Kilian Kilger <kilian.kilger@sap.com>
+Author: Kilian Kilger <kkilger@gmail.com>
## Commit message ##
git-p4: fix bug with encoding of p4 client name
@@ Commit message
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
## git-p4.py ##
+@@ git-p4.py: def isModeExecChanged(src_mode, dst_mode):
+ return isModeExec(src_mode) != isModeExec(dst_mode)
+
+
++def p4KeysContainingNonUtf8Chars():
++ """Returns all keys which may contain non UTF-8 encoded strings
++ for which a fallback strategy has to be applied.
++ """
++ return ['desc', 'client', 'FullName']
++
++
++def p4KeysContainingBinaryData():
++ """Returns all keys which may contain arbitrary binary data
++ """
++ return ['data']
++
++
++def p4KeyContainsFilePaths(key):
++ """Returns True if the key contains file paths. These are handled by decode_path().
++ Otherwise False.
++ """
++ return key.startswith('depotFile') or key in ['path', 'clientFile']
++
++
++def p4KeyWhichCanBeDirectlyDecoded(key):
++ """Returns True if the key can be directly decoded as UTF-8 string
++ Otherwise False.
++
++ Keys which can not be encoded directly:
++ - `data` which may contain arbitrary binary data
++ - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
++ - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
++ """
++ if key in p4KeysContainingNonUtf8Chars() or \
++ key in p4KeysContainingBinaryData() or \
++ p4KeyContainsFilePaths(key):
++ return False
++ return True
++
++
+ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+ errors_as_exceptions=False, *k, **kw):
+
@@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+ try:
+ while True:
+ entry = marshal.load(p4.stdout)
++
if bytes is not str:
- # Decode unmarshalled dict to use str keys and values, except for:
- # - `data` which may contain arbitrary binary data
+- # Decode unmarshalled dict to use str keys and values, except for:
+- # - `data` which may contain arbitrary binary data
- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
-+ # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
++ # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
-+ if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
++ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+ if skip_info:
+ if 'code' in entry and entry['code'] == 'info':
continue
- if 'desc' in entry:
- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
-+ if 'client' in entry:
-+ entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
- if 'FullName' in entry:
- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+- if 'desc' in entry:
+- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+- if 'FullName' in entry:
+- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
++ for key in p4KeysContainingNonUtf8Chars():
++ if key in entry:
++ entry[key] = metadata_stream_to_writable_bytes(entry[key])
if cb is not None:
+ cb(entry)
+ else:
git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 42 insertions(+), 9 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..9323b943c68 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode):
return isModeExec(src_mode) != isModeExec(dst_mode)
+def p4KeysContainingNonUtf8Chars():
+ """Returns all keys which may contain non UTF-8 encoded strings
+ for which a fallback strategy has to be applied.
+ """
+ return ['desc', 'client', 'FullName']
+
+
+def p4KeysContainingBinaryData():
+ """Returns all keys which may contain arbitrary binary data
+ """
+ return ['data']
+
+
+def p4KeyContainsFilePaths(key):
+ """Returns True if the key contains file paths. These are handled by decode_path().
+ Otherwise False.
+ """
+ return key.startswith('depotFile') or key in ['path', 'clientFile']
+
+
+def p4KeyWhichCanBeDirectlyDecoded(key):
+ """Returns True if the key can be directly decoded as UTF-8 string
+ Otherwise False.
+
+ Keys which can not be encoded directly:
+ - `data` which may contain arbitrary binary data
+ - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
+ - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ """
+ if key in p4KeysContainingNonUtf8Chars() or \
+ key in p4KeysContainingBinaryData() or \
+ p4KeyContainsFilePaths(key):
+ return False
+ return True
+
+
def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
errors_as_exceptions=False, *k, **kw):
@@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
try:
while True:
entry = marshal.load(p4.stdout)
+
if bytes is not str:
- # Decode unmarshalled dict to use str keys and values, except for:
- # - `data` which may contain arbitrary binary data
- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ -869,10 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
if skip_info:
if 'code' in entry and entry['code'] == 'info':
continue
- if 'desc' in entry:
- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
- if 'FullName' in entry:
- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+ for key in p4KeysContainingNonUtf8Chars():
+ if key in entry:
+ entry[key] = metadata_stream_to_writable_bytes(entry[key])
if cb is not None:
cb(entry)
else:
base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
--
gitgitgadget
next prev parent reply other threads:[~2022-07-18 8:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
2022-07-08 11:28 ` Tao Klerks
2022-07-08 15:05 ` Junio C Hamano
2022-07-18 8:57 ` Kilian Kilger via GitGitGadget [this message]
2022-07-18 16:36 ` [PATCH v2] " Junio C Hamano
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1285.v2.git.git.1658134679233.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=kkilger@gmail.com \
--cc=tao@klerks.biz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).