git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Tao Klerks <tao@klerks.biz>, Kilian Kilger <kkilger@gmail.com>,
	Kilian Kilger <kkilger@gmail.com>
Subject: [PATCH v2] git-p4: fix bug with encoding of p4 client name
Date: Mon, 18 Jul 2022 08:57:59 +0000	[thread overview]
Message-ID: <pull.1285.v2.git.git.1658134679233.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1285.git.git.1657267260405.gitgitgadget@gmail.com>

From: Kilian Kilger <kkilger@gmail.com>

The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.

Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
    git-p4: Fix bug with encoding of P4 client name

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v2
Pull-Request: https://github.com/git/git/pull/1285

Range-diff vs v1:

 1:  7393b59c642 ! 1:  3280a9579bc git-p4: fix bug with encoding of p4 client name
     @@
       ## Metadata ##
     -Author: Kilian Kilger <kilian.kilger@sap.com>
     +Author: Kilian Kilger <kkilger@gmail.com>
      
       ## Commit message ##
          git-p4: fix bug with encoding of p4 client name
     @@ Commit message
          Signed-off-by: Kilian Kilger <kkilger@gmail.com>
      
       ## git-p4.py ##
     +@@ git-p4.py: def isModeExecChanged(src_mode, dst_mode):
     +     return isModeExec(src_mode) != isModeExec(dst_mode)
     + 
     + 
     ++def p4KeysContainingNonUtf8Chars():
     ++    """Returns all keys which may contain non UTF-8 encoded strings
     ++       for which a fallback strategy has to be applied.
     ++       """
     ++    return ['desc', 'client', 'FullName']
     ++
     ++
     ++def p4KeysContainingBinaryData():
     ++    """Returns all keys which may contain arbitrary binary data
     ++       """
     ++    return ['data']
     ++
     ++
     ++def p4KeyContainsFilePaths(key):
     ++    """Returns True if the key contains file paths. These are handled by decode_path().
     ++       Otherwise False.
     ++       """
     ++    return key.startswith('depotFile') or key in ['path', 'clientFile']
     ++
     ++
     ++def p4KeyWhichCanBeDirectlyDecoded(key):
     ++    """Returns True if the key can be directly decoded as UTF-8 string
     ++       Otherwise False.
     ++
     ++       Keys which can not be encoded directly:
     ++         - `data` which may contain arbitrary binary data
     ++         - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
     ++         - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
     ++       """
     ++    if key in p4KeysContainingNonUtf8Chars() or \
     ++       key in p4KeysContainingBinaryData() or  \
     ++       p4KeyContainsFilePaths(key):
     ++        return False
     ++    return True
     ++
     ++
     + def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     +         errors_as_exceptions=False, *k, **kw):
     + 
      @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     +     try:
     +         while True:
     +             entry = marshal.load(p4.stdout)
     ++
                   if bytes is not str:
     -                 # Decode unmarshalled dict to use str keys and values, except for:
     -                 #   - `data` which may contain arbitrary binary data
     +-                # Decode unmarshalled dict to use str keys and values, except for:
     +-                #   - `data` which may contain arbitrary binary data
      -                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
     -+                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
     -                 #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
     +-                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
     ++                # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
                       decoded_entry = {}
                       for key, value in entry.items():
                           key = key.decode()
      -                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
     -+                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
     ++                    if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
                               value = value.decode()
                           decoded_entry[key] = value
                       # Parse out data if it's an error response
      @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     +             if skip_info:
     +                 if 'code' in entry and entry['code'] == 'info':
                           continue
     -             if 'desc' in entry:
     -                 entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
     -+            if 'client' in entry:
     -+                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
     -             if 'FullName' in entry:
     -                 entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
     +-            if 'desc' in entry:
     +-                entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
     +-            if 'FullName' in entry:
     +-                entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
     ++            for key in p4KeysContainingNonUtf8Chars():
     ++                if key in entry:
     ++                    entry[key] = metadata_stream_to_writable_bytes(entry[key])
                   if cb is not None:
     +                 cb(entry)
     +             else:


 git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..9323b943c68 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 
+def p4KeysContainingNonUtf8Chars():
+    """Returns all keys which may contain non UTF-8 encoded strings
+       for which a fallback strategy has to be applied.
+       """
+    return ['desc', 'client', 'FullName']
+
+
+def p4KeysContainingBinaryData():
+    """Returns all keys which may contain arbitrary binary data
+       """
+    return ['data']
+
+
+def p4KeyContainsFilePaths(key):
+    """Returns True if the key contains file paths. These are handled by decode_path().
+       Otherwise False.
+       """
+    return key.startswith('depotFile') or key in ['path', 'clientFile']
+
+
+def p4KeyWhichCanBeDirectlyDecoded(key):
+    """Returns True if the key can be directly decoded as UTF-8 string
+       Otherwise False.
+
+       Keys which can not be encoded directly:
+         - `data` which may contain arbitrary binary data
+         - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
+         - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+       """
+    if key in p4KeysContainingNonUtf8Chars() or \
+       key in p4KeysContainingBinaryData() or  \
+       p4KeyContainsFilePaths(key):
+        return False
+    return True
+
+
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         errors_as_exceptions=False, *k, **kw):
 
@@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     try:
         while True:
             entry = marshal.load(p4.stdout)
+
             if bytes is not str:
-                # Decode unmarshalled dict to use str keys and values, except for:
-                #   - `data` which may contain arbitrary binary data
-                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
-                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+                # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+                    if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
                         value = value.decode()
                     decoded_entry[key] = value
                 # Parse out data if it's an error response
@@ -869,10 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
-            if 'desc' in entry:
-                entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
-            if 'FullName' in entry:
-                entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+            for key in p4KeysContainingNonUtf8Chars():
+                if key in entry:
+                    entry[key] = metadata_stream_to_writable_bytes(entry[key])
             if cb is not None:
                 cb(entry)
             else:

base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
-- 
gitgitgadget

  parent reply	other threads:[~2022-07-18  8:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08  8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
2022-07-08 11:28 ` Tao Klerks
2022-07-08 15:05   ` Junio C Hamano
2022-07-18  8:57 ` Kilian Kilger via GitGitGadget [this message]
2022-07-18 16:36   ` [PATCH v2] " Junio C Hamano
2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
2022-07-21  9:07     ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
2022-07-21  9:07     ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
2022-07-21 16:46     ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1285.v2.git.git.1658134679233.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kkilger@gmail.com \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).