git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shin Kojima <shin@kojima.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, Shin Kojima <shin@kojima.org>,
	git@vger.kernel.org
Subject: Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
Date: Fri, 4 May 2018 00:16:29 +0900	[thread overview]
Message-ID: <20180503151627.45pt2veqcjzbk44q@skmbp> (raw)
In-Reply-To: <86k1skzzc4.fsf@gmail.com>

> One solution would be to force conversion to UTF-8 on input via "open"
> pragma (e.g. "use open ':encoding(UTF-8)';").  But there is no
> UTF-8-with_fallback encoding available - we would have to write one, and
> install it as module (or fake it via Perl trickery).  This mechanism is
> almost the same to what we currently use in gitwbe.

Yes, I tried using `Encode::Guess` with "open" pragma, but no luck.
https://perldoc.perl.org/Encode/Guess.html

I'm also afraid of "open" pragma does not work properly while using
git_blame_common().  Let's say someone using non-ASCII characters in
his/her name, committing non-UTF8 encoded characters.  git-blame will
combine them in the same line.  Following is an example:

$ git blame dummy | xxd
00000000: 3461 6464 3565 6331 2028 e585 90e5 b3b6  4add5ec1 (......
00000010: 20e6 96b0 2032 3031 382d 3035 2d30 3320   ... 2018-05-03
00000020: 3232 3a34 383a 3432 202b 3039 3030 2031  22:48:42 +0900 1
00000030: 2920 8367 8389 8343 0a                   ) .g...C.

    * e585 90e5 b3b6 20e6 96b0 : my name, encoded with UTF-8
    * 8367 8389 8343           : "トライ" encoded with Shift_JIS

It means I need to split each lines of git-blame output at the very
beginning, then convert the first-half as UTF-8 and the second-half as
Shift_JIS.

Sincerely,

-- 
Shin Kojima

  reply	other threads:[~2018-05-03 15:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-01  6:40 [PATCH] gitweb: Measure offsets against UTF-8 flagged string Shin Kojima
2018-05-02  8:01 ` Junio C Hamano
2018-05-02 11:47   ` Shin Kojima
2018-05-03 12:40   ` Jakub Narebski
2018-05-03 15:16     ` Shin Kojima [this message]
2018-05-04  2:38     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180503151627.45pt2veqcjzbk44q@skmbp \
    --to=shin@kojima.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).