git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: "Torsten Bögershausen" <tboegi@web.de>,
	lars.schneider@autodesk.com, git@vger.kernel.org, j6t@kdbg.org,
	sunshine@sunshineco.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de
Subject: Re: [PATCH v7 0/7] convert: add support for different encodings
Date: Fri, 23 Feb 2018 12:11:05 -0800	[thread overview]
Message-ID: <xmqq7er3tqjq.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <xmqqbmgfvf2y.fsf@gitster-ct.c.googlers.com> (Junio C. Hamano's message of "Fri, 23 Feb 2018 08:35:49 -0800")

Junio C Hamano <gitster@pobox.com> writes:

> Lars Schneider <larsxschneider@gmail.com> writes:
>
>> I still think it would be nice to see diffs for arbitrary encodings.
>> Would it be an option to read the `encoding` attribute and use it in
>> `git diff`?
>
> Reusing that gitk-only thing and suddenly start doing so would break
> gitk users, no?  The tool expects the diff to come out encoded in
> the encoding that is specified by that attribute (which is learned
> from get_path_encoding helper) and does its thing.
>
> I guess that gitk uses diff-tree plumbing and you won't be applying
> this change to the plumbing, perhaps?  If so, it might not be too
> bad, but those who decided to postprocess "git diff" output (instead
> of "git diff-tree" output) mimicking how gitk does it by thinking
> that is the safe and sane thing to do will be broken by such a
> change.  You could do "use the encoding only when a command line
> option says so", but then people will add a configuration variable
> to turn it always on and these existing scripts will be broken.
>
> I do not personally have much sympathy for the last case (i.e. those
> who scripted around 'git diff' instead of 'git diff-tree' to get
> broken), so making the new feature only work with the Porcelain "git
> diff" might be an option.  I'll need a bit more time to formulate
> the rest of my thought ;-)

So we are introducing in this series a way to say in what encoding
the things should be placed in the working tree files (i.e. the
w-t-e attribute attached to the paths).  Currently there is no
mechanism to say what encoding the in-repo contents are and UTF-8 is
assumed when conversion from/to w-t-e is required, but there is no
fundamental reason why it shouldn't be customizable (if anything, as
a piece of fact on the in-repo data, in-repo-encoding is *more*
appropriate to be an attribute than w-t-e that can merely be project
preference at best, as I mentioned earlier in this thread).  

We always use the in-repo contents when generating 'diff'.  I think
by "attribute to be used in diff", what you are reallying after is
to convert the in-repo contents to that encoding _BEFORE_ running
'diff' on it.  E.g. in-repo UTF-16 that can have NUL bytes all over
the place will not diff well with the xdiff machinery, but if you
first convert it to UTF-8 and have xdiff work on it, you can get
reasonable result out of it.  It is unclear what encoding you want
your final diff output in (it is equally valid in such a set-up to
desire your patch output in UTF-16 or UTF-8), but assuming that you
want UTF-8 in your patch output, perhaps we do not have to break
gitk users by hijacking the 'encoding' attribute.  Instead what you
want is a single bit that says between in-repo or working tree which
representation should be given to the xdiff machinery.  When that
bit is set, then we

 - First ensure that both sides of the diff input is in the working
   tree encoding by running it through convert_to_working_tree();

 - Run xdiff on it;

 - Take the xdiff result, and run it through convert_to_git(),
   before emitting (this is optional, making this a one-and-half bit
   option).

That would allow you to say "I have in-repo data in UTF-16 which I
check out as UTF-8.  xdiff machinery is unhappy.  Please do
something." perhaps?

The other way around (i.e. in-repo is UTF-8, but working tree
encoding is UTF-16) won't need xdiff issues, I would imagine.

  reply	other threads:[~2018-02-23 20:11 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 15:27 [PATCH v7 0/7] convert: add support for different encodings lars.schneider
2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-02-16 12:55   ` Ævar Arnfjörð Bjarmason
2018-02-16 18:45     ` Jeff King
2018-02-16 19:30       ` Junio C Hamano
2018-02-15 15:27 ` [PATCH v7 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-02-15 15:27 ` [PATCH v7 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-02-15 15:27 ` [PATCH v7 4/7] utf8: add function to detect a missing " lars.schneider
2018-02-15 15:27 ` [PATCH v7 5/7] convert: add 'working-tree-encoding' attribute lars.schneider
2018-02-15 15:27 ` [PATCH v7 6/7] convert: add tracing for " lars.schneider
2018-02-15 15:27 ` [PATCH v7 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-02-15 20:03 ` [PATCH v7 0/7] convert: add support for different encodings Junio C Hamano
2018-02-15 22:09   ` Jeff King
2018-02-16 18:55     ` Junio C Hamano
2018-02-16 19:25       ` Jeff King
2018-02-16 19:27         ` Jeff King
2018-02-16 19:41           ` Junio C Hamano
2018-02-21 18:06       ` Lars Schneider
2018-02-16 14:42   ` Lars Schneider
2018-02-16 16:58     ` Torsten Bögershausen
2018-02-22 20:00       ` Lars Schneider
2018-02-22 20:12         ` Jeff King
2018-02-23 16:35         ` Junio C Hamano
2018-02-23 20:11           ` Junio C Hamano [this message]
2018-02-24 15:18             ` Lars Schneider
2018-02-26  1:44               ` Jeff King
2018-02-26 17:35                 ` Torsten Bögershausen
2018-02-26 20:46                   ` Jeff King
2018-02-27 21:05                     ` Torsten Bögershausen
2018-02-27 21:25                       ` Jeff King
2018-02-27 21:55                         ` Junio C Hamano
2018-02-27 21:58                           ` Jeff King
2018-02-27 22:10                             ` Junio C Hamano
2018-02-27 22:20                               ` Jeff King
2018-02-28  8:20                         ` Torsten Bögershausen
2018-02-28 13:21                           ` Jeff King
2018-02-28 17:42                             ` Junio C Hamano
2018-03-01  7:49                               ` Jeff King
2018-03-04 10:16                             ` Torsten Bögershausen
2018-02-28 20:46                         ` Lars Schneider
2018-02-16 19:04     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7er3tqjq.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).