git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
@ 2019-08-27 18:24 Dmitry Nikulin
  2019-08-27 22:25 ` Dmitry Nikulin
  2019-08-29  3:54 ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-27 18:24 UTC (permalink / raw)
  To: git

I wrote a very simple Python script to see which arguments git-diff
passes to the external diff program when comparing files across
branches:

$ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
origin/branch1:file1.txt origin/branch2:file2.txt
['./print_argv.py',
 'file1.txt',
 '/tmp/QRaIJ1_file1.txt',
 '802b1c4ed7b06162b2ce09b7db72a576695b96e5',
 '100644',
 '/tmp/AZuOJ1_file2.txt',
 '076e8e37a712d8a66c0c3d1a103050dc509ca6ff',
 '100644',
 'file2.txt',
 'index 802b1c4..076e8e3 100644\n']

According to the docs
(https://www.git-scm.com/docs/git/2.22.0#Documentation/git.txt-codeGITEXTERNALDIFFcode),
git-diff is supposed to pass 7 parameters:

path old-file old-hex old-mode new-file new-hex new-mode

This is not what I am seeing here. Is this a bug or
incorrect/incomplete documentation?

Tested with git 2.22.0 and 2.17.1.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-27 18:24 git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly? Dmitry Nikulin
@ 2019-08-27 22:25 ` Dmitry Nikulin
  2019-08-29  3:54 ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-27 22:25 UTC (permalink / raw)
  To: git

I have put up a demo repo here: https://github.com/dniku/git-external-diff-argv

On Tue, 27 Aug 2019 at 21:24, Dmitry Nikulin <pastafariant@gmail.com> wrote:
>
> I wrote a very simple Python script to see which arguments git-diff
> passes to the external diff program when comparing files across
> branches:
>
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
> origin/branch1:file1.txt origin/branch2:file2.txt
> ['./print_argv.py',
>  'file1.txt',
>  '/tmp/QRaIJ1_file1.txt',
>  '802b1c4ed7b06162b2ce09b7db72a576695b96e5',
>  '100644',
>  '/tmp/AZuOJ1_file2.txt',
>  '076e8e37a712d8a66c0c3d1a103050dc509ca6ff',
>  '100644',
>  'file2.txt',
>  'index 802b1c4..076e8e3 100644\n']
>
> According to the docs
> (https://www.git-scm.com/docs/git/2.22.0#Documentation/git.txt-codeGITEXTERNALDIFFcode),
> git-diff is supposed to pass 7 parameters:
>
> path old-file old-hex old-mode new-file new-hex new-mode
>
> This is not what I am seeing here. Is this a bug or
> incorrect/incomplete documentation?
>
> Tested with git 2.22.0 and 2.17.1.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-27 18:24 git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly? Dmitry Nikulin
  2019-08-27 22:25 ` Dmitry Nikulin
@ 2019-08-29  3:54 ` Junio C Hamano
  2019-08-29 14:36   ` Dmitry Nikulin
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2019-08-29  3:54 UTC (permalink / raw)
  To: Dmitry Nikulin; +Cc: git

Dmitry Nikulin <pastafariant@gmail.com> writes:

> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
> origin/branch1:file1.txt origin/branch2:file2.txt

I didn't even know external-diff driver is called (and does not even segfaut)
in the "compare two blobs" hack codepath.

The syntax <tree-ish>:<path-in-the-tree> you have on the command
line resolves to a blob object name.  There is no leading directory
name, there is no permission bits or executable bit, there is no
filename, when "diff" is told to compare two blob objects this way.
THe "diff" machinery that drives the external (or internal for that
matter) diff will only get two 40-hex blob object names and nothing
else.  The only pieces of information you can trust among those the
external program may receive are the blob object name(s) and the
contents stored in the temporary files given to it.  The location of
these temporary files or their mode bits have no relation to the
"files" in some tree in the original repository, as that information
is long lost when you write <tree-ish>:<path-in-the-tree> to tell
Git to use that as a blob object name.

    $ git diff -M branch1 branch2 -- file1 file2

if file1 and file2 have similar-enough contents, may have a better
chance of what you wanted to ask Git (if I am guessing what it is,
that is).







^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-29  3:54 ` Junio C Hamano
@ 2019-08-29 14:36   ` Dmitry Nikulin
  2019-08-30 10:16     ` Phillip Wood
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-29 14:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Thank you for the reply.

On Thu, 29 Aug 2019 at 06:54, Junio C Hamano <gitster@pobox.com> wrote:
>     $ git diff -M branch1 branch2 -- file1 file2
>
> if file1 and file2 have similar-enough contents, may have a better
> chance of what you wanted to ask Git (if I am guessing what it is,
> that is).

The context here is that I am trying to diff two Jupyter notebooks
using an external tool (git-nbdiffdriver in my case). Therefore, for
me it is crucial to use the external tool, and not Git's internal
machinery.

For the particular command that you suggested as the replacement, on
my demo repository it does not produce anything interesting, as it
does not detect renames and calls my honeypot twice:

$ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
origin/branch2 -- file1.txt file2.txt
['./print_argv.py',
 'file1.txt',
 '/tmp/2IEKCw_file1.txt',
 '802b1c4ed7b06162b2ce09b7db72a576695b96e5',
 '100644',
 '/dev/null',
 '.',
 '.']
['./print_argv.py',
 'file2.txt',
 '/dev/null',
 '.',
 '.',
 '/tmp/oAMdDx_file2.txt',
 '076e8e37a712d8a66c0c3d1a103050dc509ca6ff',
 '100644']

However, for the original repository where I first faced this problem
(https://github.com/yandexdataschool/Practical_RL), Git passes a very
weird set of args to the external diff:

$ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M master coursera --
week02_value_based/seminar_vi.ipynb
week2_model_based/practice_vi.ipynb
['./print_argv.py',
 'week02_value_based/seminar_vi.ipynb',
 '/tmp/amudWz_seminar_vi.ipynb',
 '8f8016963c888b7dd8dd20f60b7d6fdb41b26c1d',
 '100644',
 '/tmp/Ub7zPz_practice_vi.ipynb',
 '21db80f53b632d975a9af0acbaf397eb717cde2c',
 '100644',
 'week2_model_based/practice_vi.ipynb',
 'similarity index 82%\n'
 'rename from week02_value_based/seminar_vi.ipynb\n'
 'rename to week2_model_based/practice_vi.ipynb\n'
 'index 8f80169..21db80f 100644\n']

I would guess that this is a bug. There can clearly be a hotfix (after
all, Git passes all of the information to the external that it should
per the spec, that is, <old|new>-path, <old|new>-hex, <old|new>-mode;
adding, however, some garbage). I do not know though to what extent
this information is correct. You say that this information is lost
when I use the <tree-ish>:<path> notation; however, Git seems to pass
paths and hexes correctly. This only leaves open the question of file
mode. Perhaps it could be preserved at least for some cases, such as
when the blob is retrieved from a path in a tree?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-29 14:36   ` Dmitry Nikulin
@ 2019-08-30 10:16     ` Phillip Wood
  2019-08-30 13:23       ` Dmitry Nikulin
  0 siblings, 1 reply; 12+ messages in thread
From: Phillip Wood @ 2019-08-30 10:16 UTC (permalink / raw)
  To: Dmitry Nikulin, Junio C Hamano; +Cc: git

Hi Dmitry

On 29/08/2019 15:36, Dmitry Nikulin wrote:
> Thank you for the reply.
> [...]
> However, for the original repository where I first faced this problem
> (https://github.com/yandexdataschool/Practical_RL), Git passes a very
> weird set of args to the external diff:
> 
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M master coursera --
> week02_value_based/seminar_vi.ipynb
> week2_model_based/practice_vi.ipynb
> ['./print_argv.py',
>   'week02_value_based/seminar_vi.ipynb',
>   '/tmp/amudWz_seminar_vi.ipynb',
>   '8f8016963c888b7dd8dd20f60b7d6fdb41b26c1d',
>   '100644',
>   '/tmp/Ub7zPz_practice_vi.ipynb',
>   '21db80f53b632d975a9af0acbaf397eb717cde2c',
>   '100644',
>   'week2_model_based/practice_vi.ipynb',
>   'similarity index 82%\n'
>   'rename from week02_value_based/seminar_vi.ipynb\n'
>   'rename to week2_model_based/practice_vi.ipynb\n'
>   'index 8f80169..21db80f 100644\n']
> 
> I would guess that this is a bug. There can clearly be a hotfix (after
> all, Git passes all of the information to the external that it should
> per the spec, that is, <old|new>-path, <old|new>-hex, <old|new>-mode;
> adding, however, some garbage). 

When git detects a rename it adds two parameters to the end of the 
argument list for the external diff program. The first extra argument is 
the new name and the second is the header with the similarity 
information[1]. I'm not sure why the last argument is being split in 
your example. It is not split in the example below

$git mv git.c renamed.c
$env GIT_EXTERNAL_DIFF='printf "|%s|\\n"' git diff HEAD
|git.c|
|/tmp/lMQpP8_git.c|
|c1ee7124edcfb0417539134d50212e997dc71c1f|
|100644|
|renamed.c|
|c1ee7124edcfb0417539134d50212e997dc71c1f|
|100644|
|renamed.c|
|similarity index 100%
rename from git.c
rename to renamed.c
|

Best Wishes

Phillip

[1] https://github.com/git/git/blob/master/diff.c#L4204

>I do not know though to what extent
> this information is correct. You say that this information is lost
> when I use the <tree-ish>:<path> notation; however, Git seems to pass
> paths and hexes correctly. This only leaves open the question of file
> mode. Perhaps it could be preserved at least for some cases, such as
> when the blob is retrieved from a path in a tree?
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 10:16     ` Phillip Wood
@ 2019-08-30 13:23       ` Dmitry Nikulin
  2019-08-30 14:17         ` Phillip Wood
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-30 13:23 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, git

On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
> I'm not sure why the last argument is being split in
> your example. It is not split in the example below

I have replicated the splitting issue on my small demo repo [1]:

$ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
origin/branch1-mv -- file1.txt file1-mv.txt
['./print_argv.py',
 'file1.txt',
 '/tmp/EWaCSc_file1.txt',
 '2bef330804cb3f6962e45a72a12a3071ee9b5888',
 '100644',
 '/tmp/mtEiSc_file1-mv.txt',
 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
 '100644',
 'file1-mv.txt',
 'similarity index 90%\n'
 'rename from file1.txt\n'
 'rename to file1-mv.txt\n'
 'index 2bef330..f8fd673 100644\n']

This is, however, tangential to the original problem: documenting the
external diff CLI interface for diffing two blobs. Here is what I am
seeing:

$ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
origin/branch1:file1.txt origin/branch1-mv:file1-mv.txt
['./print_argv.py',
 'file1.txt',
 '/tmp/n9USvy_file1.txt',
 '2bef330804cb3f6962e45a72a12a3071ee9b5888',
 '100644',
 '/tmp/Zst0uy_file1-mv.txt',
 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
 '100644',
 'file1-mv.txt',
 'index 2bef330..f8fd673 100644\n']

The meaning and origin of the last arg remains mysterious, and the
other args do not conform to the published documentation[2], which
states that the args should be:

    path old-file old-hex old-mode new-file new-hex new-mode

Instead the args that are passed are:

    path old-filename old-file old-hex old-mode new-file new-hex
new-mode new-filename something

[1]: https://github.com/dniku/git-external-diff-argv
[2]: https://www.git-scm.com/docs/git#Documentation/git.txt-codeGITEXTERNALDIFFcode

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 13:23       ` Dmitry Nikulin
@ 2019-08-30 14:17         ` Phillip Wood
  2019-08-30 14:27         ` Jeff King
  2019-08-30 14:29         ` SZEDER Gábor
  2 siblings, 0 replies; 12+ messages in thread
From: Phillip Wood @ 2019-08-30 14:17 UTC (permalink / raw)
  To: Dmitry Nikulin, phillip.wood; +Cc: Junio C Hamano, git

On 30/08/2019 14:23, Dmitry Nikulin wrote:
> On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
>> I'm not sure why the last argument is being split in
>> your example. It is not split in the example below
> 
> I have replicated the splitting issue on my small demo repo [1]:
> 
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
> origin/branch1-mv -- file1.txt file1-mv.txt
> ['./print_argv.py',
>   'file1.txt',
>   '/tmp/EWaCSc_file1.txt',
>   '2bef330804cb3f6962e45a72a12a3071ee9b5888',
>   '100644',
>   '/tmp/mtEiSc_file1-mv.txt',
>   'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>   '100644',
>   'file1-mv.txt',
>   'similarity index 90%\n'
>   'rename from file1.txt\n'
>   'rename to file1-mv.txt\n'
>   'index 2bef330..f8fd673 100644\n']

That's strange - What OS are you using? Does python do any 
pre-processing of arguments with newlines in them?

> This is, however, tangential to the original problem: documenting the
> external diff CLI interface for diffing two blobs. Here is what I am
> seeing:
> 
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
> origin/branch1:file1.txt origin/branch1-mv:file1-mv.txt
> ['./print_argv.py',
>   'file1.txt',
>   '/tmp/n9USvy_file1.txt',
>   '2bef330804cb3f6962e45a72a12a3071ee9b5888',
>   '100644',
>   '/tmp/Zst0uy_file1-mv.txt',
>   'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>   '100644',
>   'file1-mv.txt',
>   'index 2bef330..f8fd673 100644\n']
> 
> The meaning and origin of the last arg remains mysterious, and the
> other args do not conform to the published documentation[2], which
> states that the args should be:

The documentation is incomplete it should document the extra fields 
passed when it detects renames.

>      path old-file old-hex old-mode new-file new-hex new-mode
> 
> Instead the args that are passed are:
> 
>      path old-filename old-file old-hex old-mode new-file new-hex
> new-mode new-filename something

I think what is happening is that because you're passing different 
filenames in to get the blobs the 'pass rename information' code is 
being triggered so it shows the new filename as the name used to get the 
  second blob and then passes the header which only has an index line as 
there isn't any real rename information.

Best Wishes

Phillip

> [1]: https://github.com/dniku/git-external-diff-argv
> [2]: https://www.git-scm.com/docs/git#Documentation/git.txt-codeGITEXTERNALDIFFcode
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 13:23       ` Dmitry Nikulin
  2019-08-30 14:17         ` Phillip Wood
@ 2019-08-30 14:27         ` Jeff King
  2019-08-30 15:28           ` Dmitry Nikulin
  2019-08-30 14:29         ` SZEDER Gábor
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2019-08-30 14:27 UTC (permalink / raw)
  To: Dmitry Nikulin; +Cc: phillip.wood, Junio C Hamano, git

On Fri, Aug 30, 2019 at 04:23:13PM +0300, Dmitry Nikulin wrote:

> On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
> > I'm not sure why the last argument is being split in
> > your example. It is not split in the example below
> 
> I have replicated the splitting issue on my small demo repo [1]:
> 
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
> origin/branch1-mv -- file1.txt file1-mv.txt
> ['./print_argv.py',
>  'file1.txt',
>  '/tmp/EWaCSc_file1.txt',
>  '2bef330804cb3f6962e45a72a12a3071ee9b5888',
>  '100644',
>  '/tmp/mtEiSc_file1-mv.txt',
>  'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>  '100644',
>  'file1-mv.txt',
>  'similarity index 90%\n'
>  'rename from file1.txt\n'
>  'rename to file1-mv.txt\n'
>  'index 2bef330..f8fd673 100644\n']

Interesting. I _don't_ see that splitting when I run the same command in
your demo repo (nor, looking at Git's code, do I see how it could
happen; we always add the metainfo as a single argument).

> This is, however, tangential to the original problem: documenting the
> external diff CLI interface for diffing two blobs. Here is what I am
> seeing:
> 
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
> origin/branch1:file1.txt origin/branch1-mv:file1-mv.txt
> ['./print_argv.py',
>  'file1.txt',
>  '/tmp/n9USvy_file1.txt',
>  '2bef330804cb3f6962e45a72a12a3071ee9b5888',
>  '100644',
>  '/tmp/Zst0uy_file1-mv.txt',
>  'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>  '100644',
>  'file1-mv.txt',
>  'index 2bef330..f8fd673 100644\n']
> 
> The meaning and origin of the last arg remains mysterious, and the
> other args do not conform to the published documentation[2], which
> states that the args should be:
> 
>     path old-file old-hex old-mode new-file new-hex new-mode
> 
> Instead the args that are passed are:
> 
>     path old-filename old-file old-hex old-mode new-file new-hex
> new-mode new-filename something

I think you have an extra "old-filename" in the second list. Without
that, the first 7 arguments are as documented.

The last two are:

  - when the destination path differs from the source path, we append
    it. This is generally a sign of a rename/copy, though it triggers
    in the blob case because Git has been manually given a pair of files
    with non-matching names.

  - the final one is metainfo that Git typically prints between the
    "diff" header and the diff itself. This is added only when we add
    the filename, and would generally carry information about the rename
    (but of course since there isn't one, it has only the index line).

These were both added by 427dcb4bca ([PATCH] Diff overhaul, adding half
of copy detection., 2005-05-21). I think the intent was that existing
diff commands would/could ignore the extra arguments.

Certainly the world would be a better place if those were described in
the external-diff documentation (specifically in relation to renames,
which is their intended use).

As for the behavior in the blob-diff case, I think it's _pretty_
reasonable. It's certainly useful to give the new-filename. The metainfo
is mostly useless in this case, and potentially could be suppressed if
the pair is not an actual rename/copy. But I also think it's not hurting
much (and a script can tell that's what's going on by the lack of rename
lines in the metainfo).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 13:23       ` Dmitry Nikulin
  2019-08-30 14:17         ` Phillip Wood
  2019-08-30 14:27         ` Jeff King
@ 2019-08-30 14:29         ` SZEDER Gábor
  2019-08-30 16:26           ` Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2019-08-30 14:29 UTC (permalink / raw)
  To: Dmitry Nikulin; +Cc: phillip.wood, Junio C Hamano, git

On Fri, Aug 30, 2019 at 04:23:13PM +0300, Dmitry Nikulin wrote:
> On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
> > I'm not sure why the last argument is being split in
> > your example. It is not split in the example below
> 
> I have replicated the splitting issue on my small demo repo [1]:
> 
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
> origin/branch1-mv -- file1.txt file1-mv.txt
> ['./print_argv.py',
>  'file1.txt',
>  '/tmp/EWaCSc_file1.txt',
>  '2bef330804cb3f6962e45a72a12a3071ee9b5888',
>  '100644',
>  '/tmp/mtEiSc_file1-mv.txt',
>  'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>  '100644',
>  'file1-mv.txt',
>  'similarity index 90%\n'
>  'rename from file1.txt\n'
>  'rename to file1-mv.txt\n'
>  'index 2bef330..f8fd673 100644\n']

Another Python 2 vs 3 issue, perhaps?

  # Python2, good:
  $ GIT_EXTERNAL_DIFF='python2 ./print_argv.py' git --no-pager diff -M origin/branch1 origin/branch1-mv  -- file1.txt file1-mv.txt
  ['./print_argv.py',
   'file1.txt',
   '/tmp/ViB58B_file1.txt',
   '2bef330804cb3f6962e45a72a12a3071ee9b5888',
   '100644',
   '/tmp/p6OH9B_file1-mv.txt',
   'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
   '100644',
   'file1-mv.txt',
   'similarity index 90%\nrename from file1.txt\nrename to
  file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
  # Python3, bad:
  $ GIT_EXTERNAL_DIFF='python3 ./print_argv.py' git --no-pager diff -M origin/branch1 origin/branch1-mv  -- file1.txt file1-mv.txt
  ['./print_argv.py',
   'file1.txt',
   '/tmp/5xD2DS_file1.txt',
   '2bef330804cb3f6962e45a72a12a3071ee9b5888',
   '100644',
   '/tmp/vvHQGS_file1-mv.txt',
   'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
   '100644',
   'file1-mv.txt',
   'similarity index 90%\n'
   'rename from file1.txt\n'
   'rename to file1-mv.txt\n'
   'index 2bef330..f8fd673 100644\n']

Let's just stick to plain old printf for now, as suggested by Phillip
earlier, to reduce unnecessary confusion.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 14:27         ` Jeff King
@ 2019-08-30 15:28           ` Dmitry Nikulin
  2019-08-30 15:51             ` Dmitry Nikulin
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-30 15:28 UTC (permalink / raw)
  To: Jeff King; +Cc: phillip.wood, Junio C Hamano, git, szeder.dev

On Fri, 30 Aug 2019 at 17:27, Jeff King <peff@peff.net> wrote:

> I think you have an extra "old-filename" in the second list.
Agreed. My misunderstanding was that I had thought that when
documentation said "path", it meant "argv[0], the path to the diff
executable".

> Interesting. I _don't_ see that splitting when I run the same command in
> your demo repo (nor, looking at Git's code, do I see how it could
> happen; we always add the metainfo as a single argument).
Szeder Gábor below has guessed at the cause. I have tested it a little more:

# Test script to ensure that shell does not get in the way
$ cat test.py
import sys
import subprocess

subprocess.check_call([
    'python%s' % sys.argv[1],
    './print_argv.py',
    'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n',
])

# Python 2, expected behavior
$ python3 test.py 2
['./print_argv.py',
 'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n']

# Python 3, broken behavior observed before
$ python3 test.py 3
['./print_argv.py',
 'similarity index 90%\n'
 'rename from file1.txt\n'
 'rename to file1-mv.txt\n'
 'index 2bef330..f8fd673 100644\n']

# Directly via shell. Python 2, expected behavior
$ python2 print_argv.py 'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
 'similarity index 90%\\nrename from file1.txt\\nrename to
file1-mv.txt\\nindex 2bef330..f8fd673 100644\\n']

# Python 3, WTF
$ python3 print_argv.py 'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
 'similarity index 90%\\nrename from file1.txt\\nrename to '
 'file1-mv.txt\\nindex 2bef330..f8fd673 100644\\n']

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 15:28           ` Dmitry Nikulin
@ 2019-08-30 15:51             ` Dmitry Nikulin
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-30 15:51 UTC (permalink / raw)
  To: Jeff King; +Cc: phillip.wood, Junio C Hamano, git, szeder.dev

I've figured it out. It's not a bug, it's a peculiarity of pprint().
It splits strings to avoid long lines and abuses the fact that in
Python strings split with whitespace are concatenated by the parser.

Also, in my example newlines are not parsed correctly in the shell.
Escaping them, I get the exact same behavior as in
subprocess.check_call():

$ python2 print_argv.py $'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
 'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
$ python3 print_argv.py $'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
 'similarity index 90%\n'
 'rename from file1.txt\n'
 'rename to file1-mv.txt\n'
 'index 2bef330..f8fd673 100644\n']

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
  2019-08-30 14:29         ` SZEDER Gábor
@ 2019-08-30 16:26           ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-08-30 16:26 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Dmitry Nikulin, phillip.wood, git

SZEDER Gábor <szeder.dev@gmail.com> writes:

> Another Python 2 vs 3 issue, perhaps?
>
>   # Python2, good:
>    ...
>    'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>    '100644',
>    'file1-mv.txt',
>    'similarity index 90%\nrename from file1.txt\nrename to
>   file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
>   # Python3, bad:
>   ...
>    'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
>    '100644',
>    'file1-mv.txt',
>    'similarity index 90%\n'
>    'rename from file1.txt\n'
>    'rename to file1-mv.txt\n'
>    'index 2bef330..f8fd673 100644\n']
>
> Let's just stick to plain old printf for now, as suggested by Phillip
> earlier, to reduce unnecessary confusion.

I notice that the latter is pretty-printed.  Pay attention to the
last few lines that end without a trailing comma.  Literal string
concatenation taking place to form a single string here, in which
case both are giving us the same string?

By the way, sorry for an earlier response based on what I
misremembered, which may have confused the discussion unnecessarily.
Ever since 65056021 ("built-in diff.", 2006-04-28) made "git diff" a
built-in, we had used (or at least attempted to use) <path> from
<treeish>:<path>, so it is not unexpected to see paths in two blob
diff output.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-30 16:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 18:24 git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly? Dmitry Nikulin
2019-08-27 22:25 ` Dmitry Nikulin
2019-08-29  3:54 ` Junio C Hamano
2019-08-29 14:36   ` Dmitry Nikulin
2019-08-30 10:16     ` Phillip Wood
2019-08-30 13:23       ` Dmitry Nikulin
2019-08-30 14:17         ` Phillip Wood
2019-08-30 14:27         ` Jeff King
2019-08-30 15:28           ` Dmitry Nikulin
2019-08-30 15:51             ` Dmitry Nikulin
2019-08-30 14:29         ` SZEDER Gábor
2019-08-30 16:26           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).