git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git gui blame utf-8 bugs
@ 2007-12-12  9:17 Finn Arne Gangstad
  2007-12-14  6:47 ` Shawn O. Pearce
  0 siblings, 1 reply; 4+ messages in thread
From: Finn Arne Gangstad @ 2007-12-12  9:17 UTC (permalink / raw)
  To: git, spearce

git gui has some utf-8 bugs:

If you do git gui blame <file>, and the file contains utf-8 text,
the lines are not parsed as utf-8, but seemingly as iso-8859-1 instead.

Also, the hovering comment is INITIALLY shown garbled (both Author and
commit message), but if you click on a line, so that the commit
message is shown in the bottom window, the hovering message is
magically corrected to utf-8.

The text in the lower window (showing specific commits) seems to
always be handled correctly.

To reproduce: Set your author name to include some utf-8 tokens, add a
line with some utf-8 tokens to a file, commit it with a commit message
including some utf-8 tokens, and do git gui blame on the file. The
line will be garbled in the top window, and the hovering message will
be garbled until you click on the line.

Verified with git-gui.git master

- Finn Arne

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git gui blame utf-8 bugs
  2007-12-12  9:17 git gui blame utf-8 bugs Finn Arne Gangstad
@ 2007-12-14  6:47 ` Shawn O. Pearce
  2007-12-14 12:39   ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Shawn O. Pearce @ 2007-12-14  6:47 UTC (permalink / raw)
  To: Finn Arne Gangstad; +Cc: git

Finn Arne Gangstad <finnag@pvv.org> wrote:
> git gui has some utf-8 bugs:

It has several.  :-)
 
> If you do git gui blame <file>, and the file contains utf-8 text,
> the lines are not parsed as utf-8, but seemingly as iso-8859-1 instead.

Right.  git-gui is keying off the environment setting for LANG, so I
guess its set to iso-8859-1 on your system but you are working with a
utf-8 file.  We've talked about using something like .gitattributes
to store encoding hints, or to just put a global gui setting in
~/.gitconfig but neither has had any patches written for it.

UTF-8 is seemingly the most common encoding that git-gui is mangling
so maybe we should be defaulting to utf-8 until someone codes a
more intelligent patch.

> Also, the hovering comment is INITIALLY shown garbled (both Author and
> commit message), but if you click on a line, so that the commit
> message is shown in the bottom window, the hovering message is
> magically corrected to utf-8.
> 
> The text in the lower window (showing specific commits) seems to
> always be handled correctly.

That's a "feature".  :-)

What's happening here is the initial hovering message is obtained
from the machine formatted output from `git blame --incremental`
and in that format there is no encoding header so I'm just ignoring
any encoding problems.

Later when you click on a line it does `git cat-file commit $sha1`
and gets the proper encoding, and corrects the strings it originally
had gotten from git-blame.  So the hovering message "fixes" itself
later on.

Maybe here too we should be defaulting to utf-8 instead of the
native encoding.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git gui blame utf-8 bugs
  2007-12-14  6:47 ` Shawn O. Pearce
@ 2007-12-14 12:39   ` Jakub Narebski
  2007-12-17  7:50     ` Finn Arne Gangstad
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Narebski @ 2007-12-14 12:39 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Finn Arne Gangstad, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Finn Arne Gangstad <finnag@pvv.org> wrote:
> > git gui has some utf-8 bugs:
> 
> It has several.  :-)
>  
> > If you do git gui blame <file>, and the file contains utf-8 text,
> > the lines are not parsed as utf-8, but seemingly as iso-8859-1 instead.
> 
> Right.  git-gui is keying off the environment setting for LANG, so I
> guess its set to iso-8859-1 on your system but you are working with a
> utf-8 file.  We've talked about using something like .gitattributes
> to store encoding hints, or to just put a global gui setting in
> ~/.gitconfig but neither has had any patches written for it.
> 
> UTF-8 is seemingly the most common encoding that git-gui is mangling
> so maybe we should be defaulting to utf-8 until someone codes a
> more intelligent patch.

Currently there is no config variable for default encoding of file
contents (of blobs) and of filenames (of trees) because those do not
matter for core git.  But they do matter for GUI.
 
> > Also, the hovering comment is INITIALLY shown garbled (both Author and
> > commit message), but if you click on a line, so that the commit
> > message is shown in the bottom window, the hovering message is
> > magically corrected to utf-8.
> > 
> > The text in the lower window (showing specific commits) seems to
> > always be handled correctly.
> 
> That's a "feature".  :-)
> 
> What's happening here is the initial hovering message is obtained
> from the machine formatted output from `git blame --incremental`
> and in that format there is no encoding header so I'm just ignoring
> any encoding problems.

So the correct solution would be to enhance "git blame --incremental"
to output 'encoding' header when needed (when commit has encoding
header and it is different from log output encoding).

> Later when you click on a line it does `git cat-file commit $sha1`
> and gets the proper encoding, and corrects the strings it originally
> had gotten from git-blame.  So the hovering message "fixes" itself
> later on.
> 
> Maybe here too we should be defaulting to utf-8 instead of the
> native encoding.

I think this is a good idea, as git repositories are meant to be
cross-operating system (which means cross-delault-encodings)
compatible.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git gui blame utf-8 bugs
  2007-12-14 12:39   ` Jakub Narebski
@ 2007-12-17  7:50     ` Finn Arne Gangstad
  0 siblings, 0 replies; 4+ messages in thread
From: Finn Arne Gangstad @ 2007-12-17  7:50 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jakub Narebski, git

On Fri, Dec 14, 2007 at 04:39:59AM -0800, Jakub Narebski wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > Finn Arne Gangstad <finnag@pvv.org> wrote:
> > > git gui has some utf-8 bugs:
> > 
> > It has several.  :-)
> >  
> > > If you do git gui blame <file>, and the file contains utf-8 text,
> > > the lines are not parsed as utf-8, but seemingly as iso-8859-1 instead.
> > 
> > Right.  git-gui is keying off the environment setting for LANG, so I
> > guess its set to iso-8859-1 on your system but you are working with a
> > [...]

Setting LANG does not seem to have any effect at all (neiher for file
contents, aurhor or commit messages).

LANG=en_US.UTF-8 git gui blame dir.c -> same
LANG=utf-8 git gui blame dir.c -> same

- Finn Arne

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-12-17  7:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-12  9:17 git gui blame utf-8 bugs Finn Arne Gangstad
2007-12-14  6:47 ` Shawn O. Pearce
2007-12-14 12:39   ` Jakub Narebski
2007-12-17  7:50     ` Finn Arne Gangstad

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).