* Force git diff to create a binary patch? @ 2020-07-13 4:58 Jason Xu 2020-07-13 14:51 ` Junio C Hamano [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net> 0 siblings, 2 replies; 7+ messages in thread From: Jason Xu @ 2020-07-13 4:58 UTC (permalink / raw) To: git Hello everyone, Here are two small files with non-ASCII data, yet git diff considers them text instead of binary. echo -n -e '\x01\xff\xdf' > bin1 echo -n -e '\x01\xdf\xff' > bin2 git diff --binary bin1 bin2 Result: diff --git a/bin1 b/bin2 index 802dc8e..c39b638 100644 --- a/bin1 +++ b/bin2 @@ -1 +1 @@ -^A<FF><DF> \ No newline at end of file +^A<DF><FF> \ No newline at end of file I think `--binary` for `git diff` should force the creation of git binary patches if there are any non-printable-ASCII characters, since my understanding is that `--binary` is for safe encoding for email. Otherwise `-a` can be used. Original post: https://stackoverflow.com/questions/62858327/how-can-i-force-git-diff-to-create-a-git-binary-patch Thanks ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Force git diff to create a binary patch? 2020-07-13 4:58 Force git diff to create a binary patch? Jason Xu @ 2020-07-13 14:51 ` Junio C Hamano [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net> 1 sibling, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2020-07-13 14:51 UTC (permalink / raw) To: Jason Xu; +Cc: git Jason Xu <jasonx98@gmail.com> writes: > Here are two small files with non-ASCII data, yet git diff considers > them text instead of binary. Because it is based on heuristics, without a NUL in that short file, the file may not be judged to be binary. echo '* -diff' >.git/info/attributes would tell Git that they need to be treated as binary junk not suited for textual diff (you'd need to tweak the pattern to your need). ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20200713223906.GH8360@camp.crustytoothpaste.net>]
* Re: Force git diff to create a binary patch? [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net> @ 2020-07-14 4:09 ` Jason Xu 2020-07-14 4:34 ` Junio C Hamano 2020-07-14 4:34 ` Junio C Hamano 0 siblings, 2 replies; 7+ messages in thread From: Jason Xu @ 2020-07-14 4:09 UTC (permalink / raw) To: brian m. carlson; +Cc: git So then it would not be possible to produce a diff that mixed binary and text diffs? If `--binary` is for email patches, then it should still be easy to detect non-email-safe characters (probably anything that isn't a printable ASCII character). Also if git's binary detection is really just checking for a NUL byte, then it could be improved. Example: checking if bytes form any common encoding, like UTF-8, and if not then marking it as a binary. Or more advanced `file` like heuristics. I don't know if this is worth the effort, or if others think the current binary detection is fine (it likely works fine for actual executables, but not short binary data like my example). On Mon, Jul 13, 2020, 6:39 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2020-07-13 at 04:58:23, Jason Xu wrote: > > Hello everyone, > > > > Here are two small files with non-ASCII data, yet git diff considers > > them text instead of binary. > > > > echo -n -e '\x01\xff\xdf' > bin1 > > echo -n -e '\x01\xdf\xff' > bin2 > > git diff --binary bin1 bin2 > > > > Result: > > > > diff --git a/bin1 b/bin2 > > index 802dc8e..c39b638 100644 > > --- a/bin1 > > +++ b/bin2 > > @@ -1 +1 @@ > > -^A<FF><DF> > > \ No newline at end of file > > +^A<DF><FF> > > \ No newline at end of file > > > > I think `--binary` for `git diff` should force the creation of git > > binary patches if there are any non-printable-ASCII characters, since > > my understanding is that `--binary` is for safe encoding for email. > > Otherwise `-a` can be used. > > I was the one that asked you to post this here. While Junio's comment > explains _why_ this happens, I think that --binary should, well, produce > a binary patch, regardless of whether the file is text or binary. After > all, --text does the opposite. > > The description for the --binary option reads as follows: > > In addition to --full-index, output a binary diff that can be applied > with git-apply. Implies --patch. > > So we need to fix either the documentation or the code. > > I looked into what it takes to fix the code to do this; it's fairly > straightforward, but it does cause some testsuite failures which assume > the current behavior and will likely involve a small series. So if > other folks agree, I'm happy to pick this up in the next couple of weeks > and add support for it that would hopefully hit Git 2.29. > -- > brian m. carlson: Houston, Texas, US ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Force git diff to create a binary patch? 2020-07-14 4:09 ` Jason Xu @ 2020-07-14 4:34 ` Junio C Hamano 2020-07-14 4:34 ` Junio C Hamano 1 sibling, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2020-07-14 4:34 UTC (permalink / raw) To: Jason Xu; +Cc: brian m. carlson, git Jason Xu <jasonx98@gmail.com> writes: >> The description for the --binary option reads as follows: >> >> In addition to --full-index, output a binary diff that can be applied >> with git-apply. Implies --patch. >> >> So we need to fix either the documentation or the code. >> >> I looked into what it takes to fix the code to do this; it's fairly >> straightforward, but it does cause some testsuite failures which assume >> the current behavior and will likely involve a small series. So if >> other folks agree, I'm happy to pick this up in the next couple of weeks >> and add support for it that would hopefully hit Git 2.29. I am not convinced. The "--binary" option was invented as a way to tell Git to produce something that can be applied, where Git stopped at saying "binary files differ". So a commit that touches two paths, one text and the other binary, used to produce a textual patch for one and a useless "binary files differ" for the other in "git show". Such a commit can be made more useful with "git show --binary" to tell the former to still produce textual and readable patch while showing the xdelta based "binary patch" Git invented. So, no, I am less convinced "--binary" that forces "all paths are binary, so show binary patch" is a good idea. And viewed with the knowledge of that history, >> In addition to --full-index, output a binary diff that can be applied >> with git-apply. Implies --patch. this description is correct---the choice is not between "showing a binary diff and showing a useless textual diff"; the choice is between showing "binary files differ" and appliable "binary patch". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Force git diff to create a binary patch? 2020-07-14 4:09 ` Jason Xu 2020-07-14 4:34 ` Junio C Hamano @ 2020-07-14 4:34 ` Junio C Hamano 2020-07-14 4:56 ` Jason Xu 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2020-07-14 4:34 UTC (permalink / raw) To: brian m. carlson; +Cc: Jason Xu, git Jason Xu <jasonx98@gmail.com> writes: >> The description for the --binary option reads as follows: >> >> In addition to --full-index, output a binary diff that can be applied >> with git-apply. Implies --patch. >> >> So we need to fix either the documentation or the code. >> >> I looked into what it takes to fix the code to do this; it's fairly >> straightforward, but it does cause some testsuite failures which assume >> the current behavior and will likely involve a small series. So if >> other folks agree, I'm happy to pick this up in the next couple of weeks >> and add support for it that would hopefully hit Git 2.29. I am not convinced. The "--binary" option was invented as a way to tell Git to produce something that can be applied, where Git stopped at saying "binary files differ". So a commit that touches two paths, one text and the other binary, used to produce a textual patch for one and a useless "binary files differ" for the other in "git show". Such a commit can be made more useful with "git show --binary" to tell the former to still produce textual and readable patch while showing the xdelta based "binary patch" Git invented. So, no, I am less convinced "--binary" that forces "all paths are binary, so show binary patch" is a good idea. And viewed with the knowledge of that history, >> In addition to --full-index, output a binary diff that can be applied >> with git-apply. Implies --patch. this description is correct---the choice is not between "showing a binary diff and showing a useless textual diff"; the choice is between showing "binary files differ" and appliable "binary patch". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Force git diff to create a binary patch? 2020-07-14 4:34 ` Junio C Hamano @ 2020-07-14 4:56 ` Jason Xu 2020-07-14 15:35 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jason Xu @ 2020-07-14 4:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: brian m. carlson, git > The "--binary" option was invented as a way to tell Git to produce something that can be applied, where Git stopped at saying "binary files differ". Doesn't `--text` already do that? Albeit with whitespace warnings. > So, no, I am less convinced "--binary" that forces "all paths are binary, so show binary patch" is a good idea. I'm not sure exactly what this means, but if one binary and one text file changed, then there should be a binary diff and a text diff, like there is currently. That's why I propose better binary file detection, instead of (what I understand to be) "make all patches in a patch file GIT binary patches, regardless if a file is text". But there is merit to that if you don't care about readability in a "fully binary patch", or you insist on not mixing text and binary in patches. The terminology is confusing because within a file produced by diffing two directories, there are multiple diff sections for each file diffed. So while the documentation says "output a binary diff", to be clearer it should say "output a binary diff only for binary files". On Tue, Jul 14, 2020 at 12:34 AM Junio C Hamano <gitster@pobox.com> wrote: > > Jason Xu <jasonx98@gmail.com> writes: > > >> The description for the --binary option reads as follows: > >> > >> In addition to --full-index, output a binary diff that can be applied > >> with git-apply. Implies --patch. > >> > >> So we need to fix either the documentation or the code. > >> > >> I looked into what it takes to fix the code to do this; it's fairly > >> straightforward, but it does cause some testsuite failures which assume > >> the current behavior and will likely involve a small series. So if > >> other folks agree, I'm happy to pick this up in the next couple of weeks > >> and add support for it that would hopefully hit Git 2.29. > > I am not convinced. > > The "--binary" option was invented as a way to tell Git to produce > something that can be applied, where Git stopped at saying "binary > files differ". So a commit that touches two paths, one text and the > other binary, used to produce a textual patch for one and a useless > "binary files differ" for the other in "git show". Such a commit > can be made more useful with "git show --binary" to tell the former > to still produce textual and readable patch while showing the xdelta > based "binary patch" Git invented. > > So, no, I am less convinced "--binary" that forces "all paths are > binary, so show binary patch" is a good idea. > > And viewed with the knowledge of that history, > > >> In addition to --full-index, output a binary diff that can be applied > >> with git-apply. Implies --patch. > > this description is correct---the choice is not between "showing a > binary diff and showing a useless textual diff"; the choice is > between showing "binary files differ" and appliable "binary patch". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Force git diff to create a binary patch? 2020-07-14 4:56 ` Jason Xu @ 2020-07-14 15:35 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2020-07-14 15:35 UTC (permalink / raw) To: Jason Xu; +Cc: brian m. carlson, git Jason Xu <jasonx98@gmail.com> writes: >> The "--binary" option was invented as a way to tell Git to >> produce something that can be applied, where Git stopped at >> saying "binary files differ". > > Doesn't `--text` already do that? Albeit with whitespace warnings. I do not think so. --text is about forcing everything to be treated text, so you'll not see binary patches but the patch-looking thing with binary garbage you wrote in the message that started this thread. IOW, that is the opposite of what you want. And I agree that such a "diff -a" output is prone to corruption during transfer (e.g. over e-mail) and a way to tell exactly which paths should be shown as binary patches is a good thing to have. > That's why I propose better binary file detection, instead of (what I > understand to be) "make all patches in a patch file GIT binary > patches, regardless if a file is text". Oh, I agree that "treat everything as binary and produce binary patch for all paths" is a nonsense option no sane person would want to use. The users are better off exchanging bundles at that point, as such a binary-only patches are unreadable anyway. There is no "better detection"; the only thing you could get is "better heuristics" and there will always be a limit to what heuristics can do for you. And that is why you were given gitattributes very early in the discussion ;-) That is the mechanism to tell exactly which paths should be treated as binary. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-07-14 15:36 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-13 4:58 Force git diff to create a binary patch? Jason Xu 2020-07-13 14:51 ` Junio C Hamano [not found] ` <20200713223906.GH8360@camp.crustytoothpaste.net> 2020-07-14 4:09 ` Jason Xu 2020-07-14 4:34 ` Junio C Hamano 2020-07-14 4:34 ` Junio C Hamano 2020-07-14 4:56 ` Jason Xu 2020-07-14 15:35 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).