git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: lars.schneider@autodesk.com
Cc: git@vger.kernel.org, tboegi@web.de, j6t@kdbg.org,
	sunshine@sunshineco.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
	Lars Schneider <larsxschneider@gmail.com>
Subject: Re: [PATCH v7 0/7] convert: add support for different encodings
Date: Thu, 15 Feb 2018 12:03:06 -0800	[thread overview]
Message-ID: <xmqqr2pm81hh.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180215152711.158-1-lars.schneider@autodesk.com> (lars schneider's message of "Thu, 15 Feb 2018 16:27:04 +0100")

lars.schneider@autodesk.com writes:

> -- Git clients that do not support the `working-tree-encoding` attribute
> -  will checkout the respective files UTF-8 encoded and not in the
> -  expected encoding. Consequently, these files will appear different
> -  which typically causes trouble. This is in particular the case for
> -  older Git versions and alternative Git implementations such as JGit
> -  or libgit2 (as of February 2018).
> +- Third party Git implementations that do not support the
> +  `working-tree-encoding` attribute will checkout the respective files
> +  UTF-8 encoded and not in the expected encoding. Consequently, these
> +  files will appear different which typically causes trouble. This is
> +  in particular the case for older Git versions and alternative Git
> +  implementations such as JGit or libgit2 (as of February 2018).

I know somebody found "clients" misleading in the original, but the
ones that do not understand w-t-e do not have to be third party
reimplementations and imitations.  All existing Git implementations,
including ours, don't.

One thing I find more problematic is that the above places *too*
much stress on the UTF-8 centric worldview.  It is perfectly valid
to store your text contents encoded in ShiftJIS and check them out
as-is, with or without this patch.  It is grossly misleading to say
that older versions of Git will check them out in UTF-8.  "will
checkout these files as-is without encoding conversion" is a better
way to say it, probably.

Also notice that even in the world with w-t-e, such a project won't
benefit from w-t-e at all.  After all, they have been happy using
ShiftJIS in repository and using the same encoding on the working
tree, and because w-t-e assumes that everybody should be using UTF-8
internally, such a project cannot take advantage of the new
mechanism.

And from that point of view, perhaps w-t-e attribute is somewhat
misdesigned.

In general, an attribute is about the project's contents in the
manner independent of platform or environment.  You define "this
file is a C source" or "this file has JPEG image" there.  What exact
program you use to present diffs between the two versions of such a
file (external diff command) or what exact program you use to
extract the textual representations (textconv filter) is environment
and platform dependent and is left to the configuration mechanism
for each repository.

To be in line with the above design principle, I think the attribute
ought to be "the in-tree contents of this path is encoded in ..."
whose values could be things like UTF-8, ShiftJIS, etc.  What
external encoding the paths should be checked out is not a
project-wide matter, especially when talking about cross platform
projects.  Perhaps a project in Japanese language wants to check
out its contents in EUC-jp on Unices and in ShiftJIS on DOS derived
systems.  The participants all need to know what in-repository
encoding is used, which is a sensible use of attributes.  They also
need to know what the recommended external encoding to be used in
the working tree is for their platforms, but that is more like what
Makefile variable to set for their platforms, etc., and is not a
good match to the attributes system.


  parent reply	other threads:[~2018-02-15 20:03 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 15:27 [PATCH v7 0/7] convert: add support for different encodings lars.schneider
2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-02-16 12:55   ` Ævar Arnfjörð Bjarmason
2018-02-16 18:45     ` Jeff King
2018-02-16 19:30       ` Junio C Hamano
2018-02-15 15:27 ` [PATCH v7 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-02-15 15:27 ` [PATCH v7 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-02-15 15:27 ` [PATCH v7 4/7] utf8: add function to detect a missing " lars.schneider
2018-02-15 15:27 ` [PATCH v7 5/7] convert: add 'working-tree-encoding' attribute lars.schneider
2018-02-15 15:27 ` [PATCH v7 6/7] convert: add tracing for " lars.schneider
2018-02-15 15:27 ` [PATCH v7 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-02-15 20:03 ` Junio C Hamano [this message]
2018-02-15 22:09   ` [PATCH v7 0/7] convert: add support for different encodings Jeff King
2018-02-16 18:55     ` Junio C Hamano
2018-02-16 19:25       ` Jeff King
2018-02-16 19:27         ` Jeff King
2018-02-16 19:41           ` Junio C Hamano
2018-02-21 18:06       ` Lars Schneider
2018-02-16 14:42   ` Lars Schneider
2018-02-16 16:58     ` Torsten Bögershausen
2018-02-22 20:00       ` Lars Schneider
2018-02-22 20:12         ` Jeff King
2018-02-23 16:35         ` Junio C Hamano
2018-02-23 20:11           ` Junio C Hamano
2018-02-24 15:18             ` Lars Schneider
2018-02-26  1:44               ` Jeff King
2018-02-26 17:35                 ` Torsten Bögershausen
2018-02-26 20:46                   ` Jeff King
2018-02-27 21:05                     ` Torsten Bögershausen
2018-02-27 21:25                       ` Jeff King
2018-02-27 21:55                         ` Junio C Hamano
2018-02-27 21:58                           ` Jeff King
2018-02-27 22:10                             ` Junio C Hamano
2018-02-27 22:20                               ` Jeff King
2018-02-28  8:20                         ` Torsten Bögershausen
2018-02-28 13:21                           ` Jeff King
2018-02-28 17:42                             ` Junio C Hamano
2018-03-01  7:49                               ` Jeff King
2018-03-04 10:16                             ` Torsten Bögershausen
2018-02-28 20:46                         ` Lars Schneider
2018-02-16 19:04     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqr2pm81hh.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).