* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
2019-08-30 13:23 ` Dmitry Nikulin
@ 2019-08-30 14:17 ` Phillip Wood
2019-08-30 14:27 ` Jeff King
2019-08-30 14:29 ` SZEDER Gábor
2 siblings, 0 replies; 12+ messages in thread
From: Phillip Wood @ 2019-08-30 14:17 UTC (permalink / raw)
To: Dmitry Nikulin, phillip.wood; +Cc: Junio C Hamano, git
On 30/08/2019 14:23, Dmitry Nikulin wrote:
> On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
>> I'm not sure why the last argument is being split in
>> your example. It is not split in the example below
>
> I have replicated the splitting issue on my small demo repo [1]:
>
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
> origin/branch1-mv -- file1.txt file1-mv.txt
> ['./print_argv.py',
> 'file1.txt',
> '/tmp/EWaCSc_file1.txt',
> '2bef330804cb3f6962e45a72a12a3071ee9b5888',
> '100644',
> '/tmp/mtEiSc_file1-mv.txt',
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'similarity index 90%\n'
> 'rename from file1.txt\n'
> 'rename to file1-mv.txt\n'
> 'index 2bef330..f8fd673 100644\n']
That's strange - What OS are you using? Does python do any
pre-processing of arguments with newlines in them?
> This is, however, tangential to the original problem: documenting the
> external diff CLI interface for diffing two blobs. Here is what I am
> seeing:
>
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
> origin/branch1:file1.txt origin/branch1-mv:file1-mv.txt
> ['./print_argv.py',
> 'file1.txt',
> '/tmp/n9USvy_file1.txt',
> '2bef330804cb3f6962e45a72a12a3071ee9b5888',
> '100644',
> '/tmp/Zst0uy_file1-mv.txt',
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'index 2bef330..f8fd673 100644\n']
>
> The meaning and origin of the last arg remains mysterious, and the
> other args do not conform to the published documentation[2], which
> states that the args should be:
The documentation is incomplete it should document the extra fields
passed when it detects renames.
> path old-file old-hex old-mode new-file new-hex new-mode
>
> Instead the args that are passed are:
>
> path old-filename old-file old-hex old-mode new-file new-hex
> new-mode new-filename something
I think what is happening is that because you're passing different
filenames in to get the blobs the 'pass rename information' code is
being triggered so it shows the new filename as the name used to get the
second blob and then passes the header which only has an index line as
there isn't any real rename information.
Best Wishes
Phillip
> [1]: https://github.com/dniku/git-external-diff-argv
> [2]: https://www.git-scm.com/docs/git#Documentation/git.txt-codeGITEXTERNALDIFFcode
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
2019-08-30 13:23 ` Dmitry Nikulin
2019-08-30 14:17 ` Phillip Wood
@ 2019-08-30 14:27 ` Jeff King
2019-08-30 15:28 ` Dmitry Nikulin
2019-08-30 14:29 ` SZEDER Gábor
2 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2019-08-30 14:27 UTC (permalink / raw)
To: Dmitry Nikulin; +Cc: phillip.wood, Junio C Hamano, git
On Fri, Aug 30, 2019 at 04:23:13PM +0300, Dmitry Nikulin wrote:
> On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
> > I'm not sure why the last argument is being split in
> > your example. It is not split in the example below
>
> I have replicated the splitting issue on my small demo repo [1]:
>
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
> origin/branch1-mv -- file1.txt file1-mv.txt
> ['./print_argv.py',
> 'file1.txt',
> '/tmp/EWaCSc_file1.txt',
> '2bef330804cb3f6962e45a72a12a3071ee9b5888',
> '100644',
> '/tmp/mtEiSc_file1-mv.txt',
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'similarity index 90%\n'
> 'rename from file1.txt\n'
> 'rename to file1-mv.txt\n'
> 'index 2bef330..f8fd673 100644\n']
Interesting. I _don't_ see that splitting when I run the same command in
your demo repo (nor, looking at Git's code, do I see how it could
happen; we always add the metainfo as a single argument).
> This is, however, tangential to the original problem: documenting the
> external diff CLI interface for diffing two blobs. Here is what I am
> seeing:
>
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff
> origin/branch1:file1.txt origin/branch1-mv:file1-mv.txt
> ['./print_argv.py',
> 'file1.txt',
> '/tmp/n9USvy_file1.txt',
> '2bef330804cb3f6962e45a72a12a3071ee9b5888',
> '100644',
> '/tmp/Zst0uy_file1-mv.txt',
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'index 2bef330..f8fd673 100644\n']
>
> The meaning and origin of the last arg remains mysterious, and the
> other args do not conform to the published documentation[2], which
> states that the args should be:
>
> path old-file old-hex old-mode new-file new-hex new-mode
>
> Instead the args that are passed are:
>
> path old-filename old-file old-hex old-mode new-file new-hex
> new-mode new-filename something
I think you have an extra "old-filename" in the second list. Without
that, the first 7 arguments are as documented.
The last two are:
- when the destination path differs from the source path, we append
it. This is generally a sign of a rename/copy, though it triggers
in the blob case because Git has been manually given a pair of files
with non-matching names.
- the final one is metainfo that Git typically prints between the
"diff" header and the diff itself. This is added only when we add
the filename, and would generally carry information about the rename
(but of course since there isn't one, it has only the index line).
These were both added by 427dcb4bca ([PATCH] Diff overhaul, adding half
of copy detection., 2005-05-21). I think the intent was that existing
diff commands would/could ignore the extra arguments.
Certainly the world would be a better place if those were described in
the external-diff documentation (specifically in relation to renames,
which is their intended use).
As for the behavior in the blob-diff case, I think it's _pretty_
reasonable. It's certainly useful to give the new-filename. The metainfo
is mostly useless in this case, and potentially could be suppressed if
the pair is not an actual rename/copy. But I also think it's not hurting
much (and a script can tell that's what's going on by the lack of rename
lines in the metainfo).
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
2019-08-30 14:27 ` Jeff King
@ 2019-08-30 15:28 ` Dmitry Nikulin
2019-08-30 15:51 ` Dmitry Nikulin
0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-30 15:28 UTC (permalink / raw)
To: Jeff King; +Cc: phillip.wood, Junio C Hamano, git, szeder.dev
On Fri, 30 Aug 2019 at 17:27, Jeff King <peff@peff.net> wrote:
> I think you have an extra "old-filename" in the second list.
Agreed. My misunderstanding was that I had thought that when
documentation said "path", it meant "argv[0], the path to the diff
executable".
> Interesting. I _don't_ see that splitting when I run the same command in
> your demo repo (nor, looking at Git's code, do I see how it could
> happen; we always add the metainfo as a single argument).
Szeder Gábor below has guessed at the cause. I have tested it a little more:
# Test script to ensure that shell does not get in the way
$ cat test.py
import sys
import subprocess
subprocess.check_call([
'python%s' % sys.argv[1],
'./print_argv.py',
'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n',
])
# Python 2, expected behavior
$ python3 test.py 2
['./print_argv.py',
'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
# Python 3, broken behavior observed before
$ python3 test.py 3
['./print_argv.py',
'similarity index 90%\n'
'rename from file1.txt\n'
'rename to file1-mv.txt\n'
'index 2bef330..f8fd673 100644\n']
# Directly via shell. Python 2, expected behavior
$ python2 print_argv.py 'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
'similarity index 90%\\nrename from file1.txt\\nrename to
file1-mv.txt\\nindex 2bef330..f8fd673 100644\\n']
# Python 3, WTF
$ python3 print_argv.py 'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
'similarity index 90%\\nrename from file1.txt\\nrename to '
'file1-mv.txt\\nindex 2bef330..f8fd673 100644\\n']
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
2019-08-30 15:28 ` Dmitry Nikulin
@ 2019-08-30 15:51 ` Dmitry Nikulin
0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Nikulin @ 2019-08-30 15:51 UTC (permalink / raw)
To: Jeff King; +Cc: phillip.wood, Junio C Hamano, git, szeder.dev
I've figured it out. It's not a bug, it's a peculiarity of pprint().
It splits strings to avoid long lines and abuses the fact that in
Python strings split with whitespace are concatenated by the parser.
Also, in my example newlines are not parsed correctly in the shell.
Escaping them, I get the exact same behavior as in
subprocess.check_call():
$ python2 print_argv.py $'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
$ python3 print_argv.py $'similarity index 90%\nrename from
file1.txt\nrename to file1-mv.txt\nindex 2bef330..f8fd673 100644\n'
['print_argv.py',
'similarity index 90%\n'
'rename from file1.txt\n'
'rename to file1-mv.txt\n'
'index 2bef330..f8fd673 100644\n']
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
2019-08-30 13:23 ` Dmitry Nikulin
2019-08-30 14:17 ` Phillip Wood
2019-08-30 14:27 ` Jeff King
@ 2019-08-30 14:29 ` SZEDER Gábor
2019-08-30 16:26 ` Junio C Hamano
2 siblings, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2019-08-30 14:29 UTC (permalink / raw)
To: Dmitry Nikulin; +Cc: phillip.wood, Junio C Hamano, git
On Fri, Aug 30, 2019 at 04:23:13PM +0300, Dmitry Nikulin wrote:
> On Fri, 30 Aug 2019 at 13:16, Phillip Wood <phillip.wood123@gmail.com> wrote:
> > I'm not sure why the last argument is being split in
> > your example. It is not split in the example below
>
> I have replicated the splitting issue on my small demo repo [1]:
>
> $ env GIT_EXTERNAL_DIFF=./print_argv.py git diff -M origin/branch1
> origin/branch1-mv -- file1.txt file1-mv.txt
> ['./print_argv.py',
> 'file1.txt',
> '/tmp/EWaCSc_file1.txt',
> '2bef330804cb3f6962e45a72a12a3071ee9b5888',
> '100644',
> '/tmp/mtEiSc_file1-mv.txt',
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'similarity index 90%\n'
> 'rename from file1.txt\n'
> 'rename to file1-mv.txt\n'
> 'index 2bef330..f8fd673 100644\n']
Another Python 2 vs 3 issue, perhaps?
# Python2, good:
$ GIT_EXTERNAL_DIFF='python2 ./print_argv.py' git --no-pager diff -M origin/branch1 origin/branch1-mv -- file1.txt file1-mv.txt
['./print_argv.py',
'file1.txt',
'/tmp/ViB58B_file1.txt',
'2bef330804cb3f6962e45a72a12a3071ee9b5888',
'100644',
'/tmp/p6OH9B_file1-mv.txt',
'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
'100644',
'file1-mv.txt',
'similarity index 90%\nrename from file1.txt\nrename to
file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
# Python3, bad:
$ GIT_EXTERNAL_DIFF='python3 ./print_argv.py' git --no-pager diff -M origin/branch1 origin/branch1-mv -- file1.txt file1-mv.txt
['./print_argv.py',
'file1.txt',
'/tmp/5xD2DS_file1.txt',
'2bef330804cb3f6962e45a72a12a3071ee9b5888',
'100644',
'/tmp/vvHQGS_file1-mv.txt',
'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
'100644',
'file1-mv.txt',
'similarity index 90%\n'
'rename from file1.txt\n'
'rename to file1-mv.txt\n'
'index 2bef330..f8fd673 100644\n']
Let's just stick to plain old printf for now, as suggested by Phillip
earlier, to reduce unnecessary confusion.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-diff passes <rev>:<path> args to GIT_EXTERNAL_DIFF incorrectly?
2019-08-30 14:29 ` SZEDER Gábor
@ 2019-08-30 16:26 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-08-30 16:26 UTC (permalink / raw)
To: SZEDER Gábor; +Cc: Dmitry Nikulin, phillip.wood, git
SZEDER Gábor <szeder.dev@gmail.com> writes:
> Another Python 2 vs 3 issue, perhaps?
>
> # Python2, good:
> ...
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'similarity index 90%\nrename from file1.txt\nrename to
> file1-mv.txt\nindex 2bef330..f8fd673 100644\n']
> # Python3, bad:
> ...
> 'f8fd6737fbe5a45c97ba9c9de495dc46ff11eccd',
> '100644',
> 'file1-mv.txt',
> 'similarity index 90%\n'
> 'rename from file1.txt\n'
> 'rename to file1-mv.txt\n'
> 'index 2bef330..f8fd673 100644\n']
>
> Let's just stick to plain old printf for now, as suggested by Phillip
> earlier, to reduce unnecessary confusion.
I notice that the latter is pretty-printed. Pay attention to the
last few lines that end without a trailing comma. Literal string
concatenation taking place to form a single string here, in which
case both are giving us the same string?
By the way, sorry for an earlier response based on what I
misremembered, which may have confused the discussion unnecessarily.
Ever since 65056021 ("built-in diff.", 2006-04-28) made "git diff" a
built-in, we had used (or at least attempted to use) <path> from
<treeish>:<path>, so it is not unexpected to see paths in two blob
diff output.
^ permalink raw reply [flat|nested] 12+ messages in thread