git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Force git diff to create a binary patch?
@ 2020-07-13  4:58 Jason Xu
  2020-07-13 14:51 ` Junio C Hamano
       [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net>
  0 siblings, 2 replies; 7+ messages in thread
From: Jason Xu @ 2020-07-13  4:58 UTC (permalink / raw)
  To: git

Hello everyone,

Here are two small files with non-ASCII data, yet git diff considers
them text instead of binary.

    echo -n -e '\x01\xff\xdf' > bin1
    echo -n -e '\x01\xdf\xff' > bin2
    git diff --binary bin1 bin2

Result:

    diff --git a/bin1 b/bin2
    index 802dc8e..c39b638 100644
    --- a/bin1
    +++ b/bin2
    @@ -1 +1 @@
    -^A<FF><DF>
    \ No newline at end of file
    +^A<DF><FF>
    \ No newline at end of file

I think `--binary` for `git diff` should force the creation of git
binary patches if there are any non-printable-ASCII characters, since
my understanding is that `--binary` is for safe encoding for email.
Otherwise `-a` can be used.

Original post: https://stackoverflow.com/questions/62858327/how-can-i-force-git-diff-to-create-a-git-binary-patch

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Force git diff to create a binary patch?
  2020-07-13  4:58 Force git diff to create a binary patch? Jason Xu
@ 2020-07-13 14:51 ` Junio C Hamano
       [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net>
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2020-07-13 14:51 UTC (permalink / raw)
  To: Jason Xu; +Cc: git

Jason Xu <jasonx98@gmail.com> writes:

> Here are two small files with non-ASCII data, yet git diff considers
> them text instead of binary.

Because it is based on heuristics, without a NUL in that short file,
the file may not be judged to be binary.

    echo '* -diff' >.git/info/attributes

would tell Git that they need to be treated as binary junk not
suited for textual diff (you'd need to tweak the pattern to your
need).




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Force git diff to create a binary patch?
       [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net>
@ 2020-07-14  4:09   ` Jason Xu
  2020-07-14  4:34     ` Junio C Hamano
  2020-07-14  4:34     ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Jason Xu @ 2020-07-14  4:09 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

So then it would not be possible to produce a diff that mixed binary
and text diffs?

If `--binary` is for email patches, then it should still be easy to
detect non-email-safe characters (probably anything that isn't a
printable ASCII character).

Also if git's binary detection is really just checking for a NUL byte,
then it could be improved. Example: checking if bytes form any common
encoding, like UTF-8, and if not then marking it as a binary. Or more
advanced `file` like heuristics. I don't know if this is worth the
effort, or if others think the current binary detection is fine (it
likely works fine for actual executables, but not short binary data
like my example).

On Mon, Jul 13, 2020, 6:39 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-07-13 at 04:58:23, Jason Xu wrote:
> > Hello everyone,
> >
> > Here are two small files with non-ASCII data, yet git diff considers
> > them text instead of binary.
> >
> >     echo -n -e '\x01\xff\xdf' > bin1
> >     echo -n -e '\x01\xdf\xff' > bin2
> >     git diff --binary bin1 bin2
> >
> > Result:
> >
> >     diff --git a/bin1 b/bin2
> >     index 802dc8e..c39b638 100644
> >     --- a/bin1
> >     +++ b/bin2
> >     @@ -1 +1 @@
> >     -^A<FF><DF>
> >     \ No newline at end of file
> >     +^A<DF><FF>
> >     \ No newline at end of file
> >
> > I think `--binary` for `git diff` should force the creation of git
> > binary patches if there are any non-printable-ASCII characters, since
> > my understanding is that `--binary` is for safe encoding for email.
> > Otherwise `-a` can be used.
>
> I was the one that asked you to post this here.  While Junio's comment
> explains _why_ this happens, I think that --binary should, well, produce
> a binary patch, regardless of whether the file is text or binary.  After
> all, --text does the opposite.
>
> The description for the --binary option reads as follows:
>
>   In addition to --full-index, output a binary diff that can be applied
>   with git-apply. Implies --patch.
>
> So we need to fix either the documentation or the code.
>
> I looked into what it takes to fix the code to do this; it's fairly
> straightforward, but it does cause some testsuite failures which assume
> the current behavior and will likely involve a small series.  So if
> other folks agree, I'm happy to pick this up in the next couple of weeks
> and add support for it that would hopefully hit Git 2.29.
> --
> brian m. carlson: Houston, Texas, US

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Force git diff to create a binary patch?
  2020-07-14  4:09   ` Jason Xu
@ 2020-07-14  4:34     ` Junio C Hamano
  2020-07-14  4:34     ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2020-07-14  4:34 UTC (permalink / raw)
  To: Jason Xu; +Cc: brian m. carlson, git

Jason Xu <jasonx98@gmail.com> writes:

>> The description for the --binary option reads as follows:
>>
>>   In addition to --full-index, output a binary diff that can be applied
>>   with git-apply. Implies --patch.
>>
>> So we need to fix either the documentation or the code.
>>
>> I looked into what it takes to fix the code to do this; it's fairly
>> straightforward, but it does cause some testsuite failures which assume
>> the current behavior and will likely involve a small series.  So if
>> other folks agree, I'm happy to pick this up in the next couple of weeks
>> and add support for it that would hopefully hit Git 2.29.

I am not convinced.  

The "--binary" option was invented as a way to tell Git to produce
something that can be applied, where Git stopped at saying "binary
files differ".  So a commit that touches two paths, one text and the
other binary, used to produce a textual patch for one and a useless
"binary files differ" for the other in "git show".  Such a commit
can be made more useful with "git show --binary" to tell the former
to still produce textual and readable patch while showing the xdelta
based "binary patch" Git invented.

So, no, I am less convinced "--binary" that forces "all paths are
binary, so show binary patch" is a good idea.

And viewed with the knowledge of that history,

>>   In addition to --full-index, output a binary diff that can be applied
>>   with git-apply. Implies --patch.

this description is correct---the choice is not between "showing a
binary diff and showing a useless textual diff"; the choice is
between showing "binary files differ" and appliable "binary patch".


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Force git diff to create a binary patch?
  2020-07-14  4:09   ` Jason Xu
  2020-07-14  4:34     ` Junio C Hamano
@ 2020-07-14  4:34     ` Junio C Hamano
  2020-07-14  4:56       ` Jason Xu
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2020-07-14  4:34 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Jason Xu, git

Jason Xu <jasonx98@gmail.com> writes:

>> The description for the --binary option reads as follows:
>>
>>   In addition to --full-index, output a binary diff that can be applied
>>   with git-apply. Implies --patch.
>>
>> So we need to fix either the documentation or the code.
>>
>> I looked into what it takes to fix the code to do this; it's fairly
>> straightforward, but it does cause some testsuite failures which assume
>> the current behavior and will likely involve a small series.  So if
>> other folks agree, I'm happy to pick this up in the next couple of weeks
>> and add support for it that would hopefully hit Git 2.29.

I am not convinced.  

The "--binary" option was invented as a way to tell Git to produce
something that can be applied, where Git stopped at saying "binary
files differ".  So a commit that touches two paths, one text and the
other binary, used to produce a textual patch for one and a useless
"binary files differ" for the other in "git show".  Such a commit
can be made more useful with "git show --binary" to tell the former
to still produce textual and readable patch while showing the xdelta
based "binary patch" Git invented.

So, no, I am less convinced "--binary" that forces "all paths are
binary, so show binary patch" is a good idea.

And viewed with the knowledge of that history,

>>   In addition to --full-index, output a binary diff that can be applied
>>   with git-apply. Implies --patch.

this description is correct---the choice is not between "showing a
binary diff and showing a useless textual diff"; the choice is
between showing "binary files differ" and appliable "binary patch".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Force git diff to create a binary patch?
  2020-07-14  4:34     ` Junio C Hamano
@ 2020-07-14  4:56       ` Jason Xu
  2020-07-14 15:35         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Xu @ 2020-07-14  4:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git

> The "--binary" option was invented as a way to tell Git to produce something that can be applied, where Git stopped at saying "binary files differ".

Doesn't `--text` already do that? Albeit with whitespace warnings.

> So, no, I am less convinced "--binary" that forces "all paths are binary, so show binary patch" is a good idea.

I'm not sure exactly what this means, but if one binary and one text
file changed, then there should be a binary diff and a text diff, like
there is currently.
That's why I propose better binary file detection, instead of (what I
understand to be) "make all patches in a patch file GIT binary
patches, regardless if a file is text". But there is merit to that if
you don't care about readability in a "fully binary patch", or you
insist on not mixing text and binary in patches.

The terminology is confusing because within a file produced by diffing
two directories, there are multiple diff sections for each file
diffed. So while the documentation says "output a binary diff", to be
clearer it should say "output a binary diff only for binary files".

On Tue, Jul 14, 2020 at 12:34 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jason Xu <jasonx98@gmail.com> writes:
>
> >> The description for the --binary option reads as follows:
> >>
> >>   In addition to --full-index, output a binary diff that can be applied
> >>   with git-apply. Implies --patch.
> >>
> >> So we need to fix either the documentation or the code.
> >>
> >> I looked into what it takes to fix the code to do this; it's fairly
> >> straightforward, but it does cause some testsuite failures which assume
> >> the current behavior and will likely involve a small series.  So if
> >> other folks agree, I'm happy to pick this up in the next couple of weeks
> >> and add support for it that would hopefully hit Git 2.29.
>
> I am not convinced.
>
> The "--binary" option was invented as a way to tell Git to produce
> something that can be applied, where Git stopped at saying "binary
> files differ".  So a commit that touches two paths, one text and the
> other binary, used to produce a textual patch for one and a useless
> "binary files differ" for the other in "git show".  Such a commit
> can be made more useful with "git show --binary" to tell the former
> to still produce textual and readable patch while showing the xdelta
> based "binary patch" Git invented.
>
> So, no, I am less convinced "--binary" that forces "all paths are
> binary, so show binary patch" is a good idea.
>
> And viewed with the knowledge of that history,
>
> >>   In addition to --full-index, output a binary diff that can be applied
> >>   with git-apply. Implies --patch.
>
> this description is correct---the choice is not between "showing a
> binary diff and showing a useless textual diff"; the choice is
> between showing "binary files differ" and appliable "binary patch".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Force git diff to create a binary patch?
  2020-07-14  4:56       ` Jason Xu
@ 2020-07-14 15:35         ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2020-07-14 15:35 UTC (permalink / raw)
  To: Jason Xu; +Cc: brian m. carlson, git

Jason Xu <jasonx98@gmail.com> writes:

>> The "--binary" option was invented as a way to tell Git to
>> produce something that can be applied, where Git stopped at
>> saying "binary files differ".
>
> Doesn't `--text` already do that? Albeit with whitespace warnings.

I do not think so.  --text is about forcing everything to be treated
text, so you'll not see binary patches but the patch-looking thing
with binary garbage you wrote in the message that started this
thread.  IOW, that is the opposite of what you want.  And I agree
that such a "diff -a" output is prone to corruption during transfer
(e.g. over e-mail) and a way to tell exactly which paths should be
shown as binary patches is a good thing to have.

> That's why I propose better binary file detection, instead of (what I
> understand to be) "make all patches in a patch file GIT binary
> patches, regardless if a file is text".

Oh, I agree that "treat everything as binary and produce binary
patch for all paths" is a nonsense option no sane person would want
to use.  The users are better off exchanging bundles at that point,
as such a binary-only patches are unreadable anyway.

There is no "better detection"; the only thing you could get is
"better heuristics" and there will always be a limit to what
heuristics can do for you.  

And that is why you were given gitattributes very early in the
discussion ;-)  That is the mechanism to tell exactly which paths
should be treated as binary.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-14 15:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13  4:58 Force git diff to create a binary patch? Jason Xu
2020-07-13 14:51 ` Junio C Hamano
     [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net>
2020-07-14  4:09   ` Jason Xu
2020-07-14  4:34     ` Junio C Hamano
2020-07-14  4:34     ` Junio C Hamano
2020-07-14  4:56       ` Jason Xu
2020-07-14 15:35         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).