git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] git-p4: fix bug with encoding of p4 client name
@ 2022-07-08  8:01 Kilian Kilger via GitGitGadget
  2022-07-08 11:28 ` Tao Klerks
  2022-07-18  8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
  0 siblings, 2 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-08  8:01 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger

From: Kilian Kilger <kilian.kilger@sap.com>

The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.

Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
    git-p4: Fix bug with encoding of P4 client name

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v1
Pull-Request: https://github.com/git/git/pull/1285

 git-p4.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..e65d6a2b0e1 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if bytes is not str:
                 # Decode unmarshalled dict to use str keys and values, except for:
                 #   - `data` which may contain arbitrary binary data
-                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
+                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
                 #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
                         value = value.decode()
                     decoded_entry[key] = value
                 # Parse out data if it's an error response
@@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
                     continue
             if 'desc' in entry:
                 entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+            if 'client' in entry:
+                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
             if 'FullName' in entry:
                 entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
             if cb is not None:

base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-p4: fix bug with encoding of p4 client name
  2022-07-08  8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
@ 2022-07-08 11:28 ` Tao Klerks
  2022-07-08 15:05   ` Junio C Hamano
  2022-07-18  8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
  1 sibling, 1 reply; 9+ messages in thread
From: Tao Klerks @ 2022-07-08 11:28 UTC (permalink / raw)
  To: Kilian Kilger via GitGitGadget; +Cc: git, Kilian Kilger, Kilian Kilger

This makes sense to me, and I don't see anything wrong with the "form"
(and nor does GitGitGadget).

Not sure whether formal review sign-off is used on this list, I don't
tend to see it, but do I see "Reviewed-by" on patches, so FWIW:

Reviewed-by: Tao Klerks <tao@klerks.biz>


On Fri, Jul 8, 2022 at 10:01 AM Kilian Kilger via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Kilian Kilger <kilian.kilger@sap.com>
>
> The Perforce client name can contain arbitrary characters
> which do not decode to UTF-8. Use the fallback strategy
> implemented in metadata_stream_to_writable_bytes() also
> for the client name.
>
> Signed-off-by: Kilian Kilger <kkilger@gmail.com>
> ---
>     git-p4: Fix bug with encoding of P4 client name
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v1
> Pull-Request: https://github.com/git/git/pull/1285
>
>  git-p4.py | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index 8fbf6eb1fe3..e65d6a2b0e1 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>              if bytes is not str:
>                  # Decode unmarshalled dict to use str keys and values, except for:
>                  #   - `data` which may contain arbitrary binary data
> -                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
> +                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
>                  #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
>                  decoded_entry = {}
>                  for key, value in entry.items():
>                      key = key.decode()
> -                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
> +                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
>                          value = value.decode()
>                      decoded_entry[key] = value
>                  # Parse out data if it's an error response
> @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>                      continue
>              if 'desc' in entry:
>                  entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
> +            if 'client' in entry:
> +                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
>              if 'FullName' in entry:
>                  entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
>              if cb is not None:
>
> base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] git-p4: fix bug with encoding of p4 client name
  2022-07-08 11:28 ` Tao Klerks
@ 2022-07-08 15:05   ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-07-08 15:05 UTC (permalink / raw)
  To: Tao Klerks
  Cc: Kilian Kilger via GitGitGadget, git, Kilian Kilger, Kilian Kilger

Tao Klerks <tao@klerks.biz> writes:

>
>
> On Fri, Jul 8, 2022 at 10:01 AM Kilian Kilger via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Kilian Kilger <kilian.kilger@sap.com>
>>
>> The Perforce client name can contain arbitrary characters
>> which do not decode to UTF-8. Use the fallback strategy
>> implemented in metadata_stream_to_writable_bytes() also
>> for the client name.
>>
>> Signed-off-by: Kilian Kilger <kkilger@gmail.com>
>> ---
>> ...
>>
>> @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>>                      continue
>>              if 'desc' in entry:
>>                  entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
>> +            if 'client' in entry:
>> +                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
>>              if 'FullName' in entry:
>>                  entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']

We had two repetitions and now we have three, which is a good time
to see if it makes sense to reduce the temptation for future
developers to add the fourth repetition in the next round, e.g.

	for e in ["client", "desc", "FullName"]:
		if e in entry:
			entry[e] = metadata_stream_to_writable_bytes(entry[e])

or something like that?

> This makes sense to me, and I don't see anything wrong with the "form"
> (and nor does GitGitGadget).

One thing that is a bit problematic is that in-body From does not
match the sign-off.  Kilian, which identity do you want to use in
your contribution to this project? 

> Not sure whether formal review sign-off is used on this list, I don't
> tend to see it, but do I see "Reviewed-by" on patches, so FWIW:
>
> Reviewed-by: Tao Klerks <tao@klerks.biz>

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] git-p4: fix bug with encoding of p4 client name
  2022-07-08  8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
  2022-07-08 11:28 ` Tao Klerks
@ 2022-07-18  8:57 ` Kilian Kilger via GitGitGadget
  2022-07-18 16:36   ` Junio C Hamano
  2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
  1 sibling, 2 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-18  8:57 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger

From: Kilian Kilger <kkilger@gmail.com>

The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.

Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
    git-p4: Fix bug with encoding of P4 client name

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v2
Pull-Request: https://github.com/git/git/pull/1285

Range-diff vs v1:

 1:  7393b59c642 ! 1:  3280a9579bc git-p4: fix bug with encoding of p4 client name
     @@
       ## Metadata ##
     -Author: Kilian Kilger <kilian.kilger@sap.com>
     +Author: Kilian Kilger <kkilger@gmail.com>
      
       ## Commit message ##
          git-p4: fix bug with encoding of p4 client name
     @@ Commit message
          Signed-off-by: Kilian Kilger <kkilger@gmail.com>
      
       ## git-p4.py ##
     +@@ git-p4.py: def isModeExecChanged(src_mode, dst_mode):
     +     return isModeExec(src_mode) != isModeExec(dst_mode)
     + 
     + 
     ++def p4KeysContainingNonUtf8Chars():
     ++    """Returns all keys which may contain non UTF-8 encoded strings
     ++       for which a fallback strategy has to be applied.
     ++       """
     ++    return ['desc', 'client', 'FullName']
     ++
     ++
     ++def p4KeysContainingBinaryData():
     ++    """Returns all keys which may contain arbitrary binary data
     ++       """
     ++    return ['data']
     ++
     ++
     ++def p4KeyContainsFilePaths(key):
     ++    """Returns True if the key contains file paths. These are handled by decode_path().
     ++       Otherwise False.
     ++       """
     ++    return key.startswith('depotFile') or key in ['path', 'clientFile']
     ++
     ++
     ++def p4KeyWhichCanBeDirectlyDecoded(key):
     ++    """Returns True if the key can be directly decoded as UTF-8 string
     ++       Otherwise False.
     ++
     ++       Keys which can not be encoded directly:
     ++         - `data` which may contain arbitrary binary data
     ++         - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
     ++         - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
     ++       """
     ++    if key in p4KeysContainingNonUtf8Chars() or \
     ++       key in p4KeysContainingBinaryData() or  \
     ++       p4KeyContainsFilePaths(key):
     ++        return False
     ++    return True
     ++
     ++
     + def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     +         errors_as_exceptions=False, *k, **kw):
     + 
      @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     +     try:
     +         while True:
     +             entry = marshal.load(p4.stdout)
     ++
                   if bytes is not str:
     -                 # Decode unmarshalled dict to use str keys and values, except for:
     -                 #   - `data` which may contain arbitrary binary data
     +-                # Decode unmarshalled dict to use str keys and values, except for:
     +-                #   - `data` which may contain arbitrary binary data
      -                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
     -+                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
     -                 #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
     +-                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
     ++                # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
                       decoded_entry = {}
                       for key, value in entry.items():
                           key = key.decode()
      -                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
     -+                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
     ++                    if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
                               value = value.decode()
                           decoded_entry[key] = value
                       # Parse out data if it's an error response
      @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     +             if skip_info:
     +                 if 'code' in entry and entry['code'] == 'info':
                           continue
     -             if 'desc' in entry:
     -                 entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
     -+            if 'client' in entry:
     -+                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
     -             if 'FullName' in entry:
     -                 entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
     +-            if 'desc' in entry:
     +-                entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
     +-            if 'FullName' in entry:
     +-                entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
     ++            for key in p4KeysContainingNonUtf8Chars():
     ++                if key in entry:
     ++                    entry[key] = metadata_stream_to_writable_bytes(entry[key])
                   if cb is not None:
     +                 cb(entry)
     +             else:


 git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..9323b943c68 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 
+def p4KeysContainingNonUtf8Chars():
+    """Returns all keys which may contain non UTF-8 encoded strings
+       for which a fallback strategy has to be applied.
+       """
+    return ['desc', 'client', 'FullName']
+
+
+def p4KeysContainingBinaryData():
+    """Returns all keys which may contain arbitrary binary data
+       """
+    return ['data']
+
+
+def p4KeyContainsFilePaths(key):
+    """Returns True if the key contains file paths. These are handled by decode_path().
+       Otherwise False.
+       """
+    return key.startswith('depotFile') or key in ['path', 'clientFile']
+
+
+def p4KeyWhichCanBeDirectlyDecoded(key):
+    """Returns True if the key can be directly decoded as UTF-8 string
+       Otherwise False.
+
+       Keys which can not be encoded directly:
+         - `data` which may contain arbitrary binary data
+         - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
+         - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+       """
+    if key in p4KeysContainingNonUtf8Chars() or \
+       key in p4KeysContainingBinaryData() or  \
+       p4KeyContainsFilePaths(key):
+        return False
+    return True
+
+
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         errors_as_exceptions=False, *k, **kw):
 
@@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     try:
         while True:
             entry = marshal.load(p4.stdout)
+
             if bytes is not str:
-                # Decode unmarshalled dict to use str keys and values, except for:
-                #   - `data` which may contain arbitrary binary data
-                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
-                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+                # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+                    if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
                         value = value.decode()
                     decoded_entry[key] = value
                 # Parse out data if it's an error response
@@ -869,10 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
-            if 'desc' in entry:
-                entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
-            if 'FullName' in entry:
-                entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+            for key in p4KeysContainingNonUtf8Chars():
+                if key in entry:
+                    entry[key] = metadata_stream_to_writable_bytes(entry[key])
             if cb is not None:
                 cb(entry)
             else:

base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] git-p4: fix bug with encoding of p4 client name
  2022-07-18  8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
@ 2022-07-18 16:36   ` Junio C Hamano
  2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-07-18 16:36 UTC (permalink / raw)
  To: Kilian Kilger via GitGitGadget; +Cc: git, Tao Klerks, Kilian Kilger

"Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Kilian Kilger <kkilger@gmail.com>
>
> The Perforce client name can contain arbitrary characters
> which do not decode to UTF-8. Use the fallback strategy
> implemented in metadata_stream_to_writable_bytes() also
> for the client name.
>
> Signed-off-by: Kilian Kilger <kkilger@gmail.com>
> ---
>     git-p4: Fix bug with encoding of P4 client name

Sorry, is this to improve the topic that has already last Monday?

If so, it is way too late to update with a replacement patch, like
this.  Instead, please send in a follow-up patch on top of what has
already been merged.

The description above seems to be identical to what was in the
previous round, but the patch seems to use a lot more code, so
such a follow-up patch would have plenty to explain how the new
strategy improves over the previous one.

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name
  2022-07-18  8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
  2022-07-18 16:36   ` Junio C Hamano
@ 2022-07-21  9:07   ` Kilian Kilger via GitGitGadget
  2022-07-21  9:07     ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
                       ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-21  9:07 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Kilian Kilger

Kilian Kilger (2):
  git-p4: fix bug with encoding of p4 client name
  git-p4: refactoring of p4CmdList()

 git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 9 deletions(-)


base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v3
Pull-Request: https://github.com/git/git/pull/1285

Range-diff vs v2:

 -:  ----------- > 1:  87e7809b75a git-p4: fix bug with encoding of p4 client name
 1:  3280a9579bc ! 2:  4a81423f0e8 git-p4: fix bug with encoding of p4 client name
     @@ Metadata
      Author: Kilian Kilger <kkilger@gmail.com>
      
       ## Commit message ##
     -    git-p4: fix bug with encoding of p4 client name
     +    git-p4: refactoring of p4CmdList()
      
     -    The Perforce client name can contain arbitrary characters
     -    which do not decode to UTF-8. Use the fallback strategy
     -    implemented in metadata_stream_to_writable_bytes() also
     -    for the client name.
     +    The function p4CmdList executes a Perforce command and
     +    decodes the marshalled python dictionary. Special care has to be
     +    taken for certain dictionary values which contain non-unicode characters.
     +    The old handling contained separate hacks for each of the corresponding
     +    dictionary keys. This commit tries to refactor the coding to handle the
     +    special cases uniformely.
      
          Signed-off-by: Kilian Kilger <kkilger@gmail.com>
      
     @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=F
                   if bytes is not str:
      -                # Decode unmarshalled dict to use str keys and values, except for:
      -                #   - `data` which may contain arbitrary binary data
     --                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
     +-                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
      -                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
      +                # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
                       decoded_entry = {}
                       for key, value in entry.items():
                           key = key.decode()
     --                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
     +-                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
      +                    if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
                               value = value.decode()
                           decoded_entry[key] = value
     @@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=F
                           continue
      -            if 'desc' in entry:
      -                entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
     +-            if 'client' in entry:
     +-                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
      -            if 'FullName' in entry:
      -                entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
      +            for key in p4KeysContainingNonUtf8Chars():

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] git-p4: fix bug with encoding of p4 client name
  2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
@ 2022-07-21  9:07     ` Kilian Kilger via GitGitGadget
  2022-07-21  9:07     ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
  2022-07-21 16:46     ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
  2 siblings, 0 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-21  9:07 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger

From: Kilian Kilger <kilian.kilger@sap.com>

The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.

Signed-off-by: Kilian Kilger <kkilger@gmail.com>
Reviewed-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-p4.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..e65d6a2b0e1 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if bytes is not str:
                 # Decode unmarshalled dict to use str keys and values, except for:
                 #   - `data` which may contain arbitrary binary data
-                #   - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
+                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
                 #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
                         value = value.decode()
                     decoded_entry[key] = value
                 # Parse out data if it's an error response
@@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
                     continue
             if 'desc' in entry:
                 entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+            if 'client' in entry:
+                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
             if 'FullName' in entry:
                 entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
             if cb is not None:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] git-p4: refactoring of p4CmdList()
  2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
  2022-07-21  9:07     ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
@ 2022-07-21  9:07     ` Kilian Kilger via GitGitGadget
  2022-07-21 16:46     ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
  2 siblings, 0 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-21  9:07 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger

From: Kilian Kilger <kkilger@gmail.com>

The function p4CmdList executes a Perforce command and
decodes the marshalled python dictionary. Special care has to be
taken for certain dictionary values which contain non-unicode characters.
The old handling contained separate hacks for each of the corresponding
dictionary keys. This commit tries to refactor the coding to handle the
special cases uniformely.

Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
 git-p4.py | 53 ++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index e65d6a2b0e1..9323b943c68 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 
+def p4KeysContainingNonUtf8Chars():
+    """Returns all keys which may contain non UTF-8 encoded strings
+       for which a fallback strategy has to be applied.
+       """
+    return ['desc', 'client', 'FullName']
+
+
+def p4KeysContainingBinaryData():
+    """Returns all keys which may contain arbitrary binary data
+       """
+    return ['data']
+
+
+def p4KeyContainsFilePaths(key):
+    """Returns True if the key contains file paths. These are handled by decode_path().
+       Otherwise False.
+       """
+    return key.startswith('depotFile') or key in ['path', 'clientFile']
+
+
+def p4KeyWhichCanBeDirectlyDecoded(key):
+    """Returns True if the key can be directly decoded as UTF-8 string
+       Otherwise False.
+
+       Keys which can not be encoded directly:
+         - `data` which may contain arbitrary binary data
+         - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
+         - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+       """
+    if key in p4KeysContainingNonUtf8Chars() or \
+       key in p4KeysContainingBinaryData() or  \
+       p4KeyContainsFilePaths(key):
+        return False
+    return True
+
+
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         errors_as_exceptions=False, *k, **kw):
 
@@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     try:
         while True:
             entry = marshal.load(p4.stdout)
+
             if bytes is not str:
-                # Decode unmarshalled dict to use str keys and values, except for:
-                #   - `data` which may contain arbitrary binary data
-                #   - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
-                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+                # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
+                    if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
                         value = value.decode()
                     decoded_entry[key] = value
                 # Parse out data if it's an error response
@@ -869,12 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
-            if 'desc' in entry:
-                entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
-            if 'client' in entry:
-                entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
-            if 'FullName' in entry:
-                entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+            for key in p4KeysContainingNonUtf8Chars():
+                if key in entry:
+                    entry[key] = metadata_stream_to_writable_bytes(entry[key])
             if cb is not None:
                 cb(entry)
             else:
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name
  2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
  2022-07-21  9:07     ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
  2022-07-21  9:07     ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
@ 2022-07-21 16:46     ` Junio C Hamano
  2 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-07-21 16:46 UTC (permalink / raw)
  To: Kilian Kilger via GitGitGadget; +Cc: git, Tao Klerks, Kilian Kilger

"Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Kilian Kilger (2):
>   git-p4: fix bug with encoding of p4 client name
>   git-p4: refactoring of p4CmdList()

[PATCH v3 1/2] seems to be identical to 34f67c96 (git-p4: fix bug
with encoding of p4 client name, 2022-07-08) that was already merged
to 'next' yesterday, so I can safely ignore it and take [PATCH v3
2/2] as if it were a single follow-up patch on the same topic and
queue it on top.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-21 16:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-08  8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
2022-07-08 11:28 ` Tao Klerks
2022-07-08 15:05   ` Junio C Hamano
2022-07-18  8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
2022-07-18 16:36   ` Junio C Hamano
2022-07-21  9:07   ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
2022-07-21  9:07     ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
2022-07-21  9:07     ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
2022-07-21 16:46     ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).