git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* change the filetype from binary to text after the file is commited to a git repo
@ 2017-07-24  5:11 tonka tonka
  2017-07-24 18:18 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: tonka tonka @ 2017-07-24  5:11 UTC (permalink / raw)
  To: git

Hey everybody,

I have a problem with an already committed file into my repo. This git
repo was converted from svn to git some years ago. Last week I have
change some lines in a file and I saw in the diff that it is marked as
binary (it's a simple .cpp file). I think on the first commit it was
detected as an utf-16 file (on windows). But no matter what I do I
can't get it back to a "normal text" text file (git does not detect
that), but I is now only utf-8. I also replace the whole content of
the file with just 'a' and git say it's binary.


Is the only way to get it back to text-mode?:
* copy a utf-8 version of the original file
* delete the file
* make a commit
* add the old file as a new one

I think that will work but it will also break my history.

Is there a better way to get these behavior without losing history?

Best regards
Tonka

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change the filetype from binary to text after the file is commited to a git repo
  2017-07-24  5:11 change the filetype from binary to text after the file is commited to a git repo tonka tonka
@ 2017-07-24 18:18 ` Jeff King
  2017-07-24 19:02   ` tonka3100
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-07-24 18:18 UTC (permalink / raw)
  To: tonka tonka; +Cc: git

On Mon, Jul 24, 2017 at 07:11:06AM +0200, tonka tonka wrote:

> I have a problem with an already committed file into my repo. This git
> repo was converted from svn to git some years ago. Last week I have
> change some lines in a file and I saw in the diff that it is marked as
> binary (it's a simple .cpp file). I think on the first commit it was
> detected as an utf-16 file (on windows). But no matter what I do I
> can't get it back to a "normal text" text file (git does not detect
> that), but I is now only utf-8. I also replace the whole content of
> the file with just 'a' and git say it's binary.

Git doesn't store a flag for "binary-ness" on each file (though see
below). As the diffs are generated on the fly when you ask to compare
two versions, so too is the determination of "is this binary".

The default heuristic looks at file size (by default, if the file is
over 500MB it's considered binary) and whether it has any zero-byte
characters in the first few kilobytes. But note that if _either_ side of
a diff is considered binary, then Git won't show a text diff.

If you want a particular diff to show all content, even if it doesn't
look like text, add "-a" to your git invocation (e.g., "git show -a").

That said, you can also use .gitattributes (see "git help attributes")
to mark a file as binary or not-binary, skipping the heuristic check.
I'm guessing since you converted from svn that you don't have a
.gitattributes file, but it's possible that somebody later added one
that marks the file as binary (and so the solution would be to drop that
entry).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change the filetype from binary to text after the file is commited to a git repo
  2017-07-24 18:18 ` Jeff King
@ 2017-07-24 19:02   ` tonka3100
  2017-07-24 19:23     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: tonka3100 @ 2017-07-24 19:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hey Jeff,

Thx for your answer.

There is no .gitattributes file in the repo. I think that the git heuristic will also detect utf-16 files as binary (in windows), so i think that is the reason why my file is binary (i have to check that tomorrow). If i add a .gitattribute file i have the problem that git diff will treat the old and the new blob as utf-8, which generate garbage.

Do you have another idea?
Could it be possible to add only a space in code (utf-8) and then add the real content in a second commit, so the old and the new one are both utf-8?

> Am 24.07.2017 um 20:18 schrieb Jeff King <peff@peff.net>:
> 
>> On Mon, Jul 24, 2017 at 07:11:06AM +0200, tonka tonka wrote:
>> 
>> I have a problem with an already committed file into my repo. This git
>> repo was converted from svn to git some years ago. Last week I have
>> change some lines in a file and I saw in the diff that it is marked as
>> binary (it's a simple .cpp file). I think on the first commit it was
>> detected as an utf-16 file (on windows). But no matter what I do I
>> can't get it back to a "normal text" text file (git does not detect
>> that), but I is now only utf-8. I also replace the whole content of
>> the file with just 'a' and git say it's binary.
> 
> Git doesn't store a flag for "binary-ness" on each file (though see
> below). As the diffs are generated on the fly when you ask to compare
> two versions, so too is the determination of "is this binary".
> 
> The default heuristic looks at file size (by default, if the file is
> over 500MB it's considered binary) and whether it has any zero-byte
> characters in the first few kilobytes. But note that if _either_ side of
> a diff is considered binary, then Git won't show a text diff.
> 
> If you want a particular diff to show all content, even if it doesn't
> look like text, add "-a" to your git invocation (e.g., "git show -a").
> 
> That said, you can also use .gitattributes (see "git help attributes")
> to mark a file as binary or not-binary, skipping the heuristic check.
> I'm guessing since you converted from svn that you don't have a
> .gitattributes file, but it's possible that somebody later added one
> that marks the file as binary (and so the solution would be to drop that
> entry).
> 
> -Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change the filetype from binary to text after the file is commited to a git repo
  2017-07-24 19:02   ` tonka3100
@ 2017-07-24 19:23     ` Jeff King
       [not found]       ` <DBBA7352-5276-4972-A437-F27F5F4C2641@gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-07-24 19:23 UTC (permalink / raw)
  To: tonka3100@gmail.com; +Cc: git

On Mon, Jul 24, 2017 at 09:02:12PM +0200, tonka3100@gmail.com wrote:

> There is no .gitattributes file in the repo. I think that the git
> heuristic will also detect utf-16 files as binary (in windows), so i
> think that is the reason why my file is binary (i have to check that
> tomorrow).

Correct. UTF-16 _is_ binary, if you are trying to include it alongside
ASCII content (like the rest of the text diff headers). The two cannot
mix.

> If i add a .gitattribute file i have the problem that git
> diff will treat the old and the new blob as utf-8, which generate
> garbage.

Git's diff doesn't look at encodings at all; it does a diff of the
actual bytes without respect to any encoding. So yes, if you use "-a" or
a gitattribute to ask git to show you the bytes, the UTF-16 is likely to
look like garbage (and a commit rewriting from utf-16 to utf-8 will
basically be a rewrite of the whole file contents).

> Do you have another idea?  Could it be possible to add only a space in
> code (utf-8) and then add the real content in a second commit, so the
> old and the new one are both utf-8?

I'm not sure exactly what you're trying to accomplish. If you're unhappy
with the file as utf-16, then you should probably convert to utf-8 as a
single commit (since the diff will otherwise be unreadable) and then
make further changes in utf-8.

If you need the file to remain utf-16 but you want more readable diffs
for those versions, you can ask git to convert to utf-8 before
performing the diff. Such a diff couldn't be applied, but would be
useful for reading. E.g., try:

  echo 'file diff=utf16' >.gitattributes
  git config diff.utf16.textconv 'iconv -f utf16 -t utf8'

You can read more about how this works in the "textconv" section of "git
help attributes".

Note that I'm relying on the external "iconv" tool to do the conversion
there. It's pretty standard on most Unix systems, but I don't know what
would be the best tool on Windows.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change the filetype from binary to text after the file is commited to a git repo
       [not found]       ` <DBBA7352-5276-4972-A437-F27F5F4C2641@gmail.com>
@ 2017-07-24 20:32         ` Jeff King
  2017-07-24 20:34           ` tonka3100
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-07-24 20:32 UTC (permalink / raw)
  To: tonka3100@gmail.com; +Cc: git

On Mon, Jul 24, 2017 at 10:26:22PM +0200, tonka3100@gmail.com wrote:

> > I'm not sure exactly what you're trying to accomplish. If you're unhappy
> > with the file as utf-16, then you should probably convert to utf-8 as a
> > single commit (since the diff will otherwise be unreadable) and then
> > make further changes in utf-8.

> That was exactly what i'm searching for. The utf-16 back in the days
> was by accident (thx to visual studio). So if the last commit and the
> acutal change are both utf-8 the diff should work again.  Just for my
> understanding. Git just take the bytes of the whole file on every
> commit, so there is no general problem with that, the size of the
> utf-16 is just twice as big as the utf-8 one, is that correct?

Right. The diff switching the encodings will be listed as "binary" (and
you should write a good commit message explaining what's going on!), but
then after that the changes to the utf-8 version will display as normal
text.  Git only looks at the actual bytes being diffed, not older
versions of the file.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: change the filetype from binary to text after the file is commited to a git repo
  2017-07-24 20:32         ` Jeff King
@ 2017-07-24 20:34           ` tonka3100
  0 siblings, 0 replies; 6+ messages in thread
From: tonka3100 @ 2017-07-24 20:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Thx jeff, i will try it tomorrow.

> Am 24.07.2017 um 22:32 schrieb Jeff King <peff@peff.net>:
> 
> On Mon, Jul 24, 2017 at 10:26:22PM +0200, tonka3100@gmail.com wrote:
> 
>>> I'm not sure exactly what you're trying to accomplish. If you're unhappy
>>> with the file as utf-16, then you should probably convert to utf-8 as a
>>> single commit (since the diff will otherwise be unreadable) and then
>>> make further changes in utf-8.
> 
>> That was exactly what i'm searching for. The utf-16 back in the days
>> was by accident (thx to visual studio). So if the last commit and the
>> acutal change are both utf-8 the diff should work again.  Just for my
>> understanding. Git just take the bytes of the whole file on every
>> commit, so there is no general problem with that, the size of the
>> utf-16 is just twice as big as the utf-8 one, is that correct?
> 
> Right. The diff switching the encodings will be listed as "binary" (and
> you should write a good commit message explaining what's going on!), but
> then after that the changes to the utf-8 version will display as normal
> text.  Git only looks at the actual bytes being diffed, not older
> versions of the file.
> 
> -Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-24 20:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24  5:11 change the filetype from binary to text after the file is commited to a git repo tonka tonka
2017-07-24 18:18 ` Jeff King
2017-07-24 19:02   ` tonka3100
2017-07-24 19:23     ` Jeff King
     [not found]       ` <DBBA7352-5276-4972-A437-F27F5F4C2641@gmail.com>
2017-07-24 20:32         ` Jeff King
2017-07-24 20:34           ` tonka3100

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).