* [PATCH 0/2] Improve documentation on UTF-16 @ 2018-12-27 2:17 brian m. carlson 2018-12-27 2:17 ` [PATCH 1/2] Documentation: document UTF-16-related behavior brian m. carlson ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: brian m. carlson @ 2018-12-27 2:17 UTC (permalink / raw) To: git; +Cc: Lars Schneider, Torsten Bögershausen We've recently fielded several reports from unhappy Windows users about our handling of UTF-16, UTF-16LE, and UTF-16BE, none of which seem to be suitable for certain Windows programs. In an effort to communicate the reasons for our behavior more effectively, explain in the documentation that the UTF-16 variant that people have been asking for hasn't been standardized, and therefore hasn't been implemented in iconv(3). Mention what each of the variants do, so that people can make a decision which one meets their needs the best. In addition, add a comment in the code about why we must, for correctness reasons, reject a UTF-16LE or UTF-16BE sequence that begins with U+FEFF, namely that such a codepoint semantically represents a ZWNBSP, not a BOM, but that that codepoint at the beginning of a UTF-8 sequence (as encoded in the object store) would be misinterpreted as a BOM instead. This comment is in the code because I think it needs to be somewhere, but I'm not sure the documentation is the right place for it. If desired, I can add it to the documentation, although I feel the lurid details are not interesting to most users. If the wording is confusing, I'm very open to hearing suggestions for how to improve it. I don't use Windows, so I don't know what MSVCRT does. If it requires a BOM but doesn't accept big-endian encoding, then perhaps we should report that as a bug to Microsoft so it can be fixed in a future version. That would probably make a lot more programs work right out of the box and dramatically improve the user experience. As a note, I'm currently on vacation through the 2nd, so my responses may be slightly delayed. brian m. carlson (2): Documentation: document UTF-16-related behavior utf8: add comment explaining why BOMs are rejected Documentation/gitattributes.txt | 5 +++++ utf8.c | 7 +++++++ 2 files changed, 12 insertions(+) ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] Documentation: document UTF-16-related behavior 2018-12-27 2:17 [PATCH 0/2] Improve documentation on UTF-16 brian m. carlson @ 2018-12-27 2:17 ` brian m. carlson 2018-12-27 2:17 ` [PATCH 2/2] utf8: add comment explaining why BOMs are rejected brian m. carlson ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: brian m. carlson @ 2018-12-27 2:17 UTC (permalink / raw) To: git; +Cc: Lars Schneider, Torsten Bögershausen There are a number of broken Windows programs which want to process files in a UTF-16 variant that is always little endian and always contains a BOM. Git cannot produce or accept such an encoding for the working-tree-encoding because no such encoding has been defined with IANA or implemented in iconv(3). Document this behavior since it is a frequent source of confusion for users. Additionally, document that specifying "UTF-16" may produce bytes of either endianness, but will be sure to provide a BOM to distinguish. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- Documentation/gitattributes.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index b8392fc330..2b2c93afd1 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -330,6 +330,11 @@ That operation will fail and cause an error. - Reencoding content requires resources that might slow down certain Git operations (e.g 'git checkout' or 'git add'). +- It is not possible to specify a variant of UTF-16 with a BOM and a + specified endianness, because no such variants have been standardized. + Using "UTF-16" will produce a BOM with an unspecified endianness, and + using "UTF-16LE" or "UTF-16BE" will prohibit a BOM from being used. + Use the `working-tree-encoding` attribute only if you cannot store a file in UTF-8 encoding and if you want Git to be able to process the content as text. ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] utf8: add comment explaining why BOMs are rejected 2018-12-27 2:17 [PATCH 0/2] Improve documentation on UTF-16 brian m. carlson 2018-12-27 2:17 ` [PATCH 1/2] Documentation: document UTF-16-related behavior brian m. carlson @ 2018-12-27 2:17 ` brian m. carlson 2018-12-27 10:06 ` [PATCH 0/2] Improve documentation on UTF-16 Johannes Sixt 2018-12-28 8:46 ` Ævar Arnfjörð Bjarmason 3 siblings, 0 replies; 12+ messages in thread From: brian m. carlson @ 2018-12-27 2:17 UTC (permalink / raw) To: git; +Cc: Lars Schneider, Torsten Bögershausen A source of confusion for many Git users is why UTF-16LE and UTF-16BE do not allow a BOM, instead treating it as a ZWNBSP, according to the Unicode FAQ[0]. Explain in a comment why we cannot allow that to occur due to our use of UTF-8 internally. [0] https://unicode.org/faq/utf_bom.html#bom9 Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- utf8.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/utf8.c b/utf8.c index eb78587504..22af2c485a 100644 --- a/utf8.c +++ b/utf8.c @@ -571,6 +571,13 @@ static const char utf16_le_bom[] = {'\xFF', '\xFE'}; static const char utf32_be_bom[] = {'\0', '\0', '\xFE', '\xFF'}; static const char utf32_le_bom[] = {'\xFF', '\xFE', '\0', '\0'}; +/* + * We check here for a forbidden BOM. When using UTF-16BE or UTF-16LE, a BOM is + * not allowed by RFC 2781, and any U+FEFF would be treated as a ZWNBSP, not a + * BOM. However, because we encode into UTF-8 internally, we cannot allow that + * character to occur as a ZWNBSP, since when encoded into UTF-8 it would be + * interpreted as a BOM. + */ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) { return ( ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-27 2:17 [PATCH 0/2] Improve documentation on UTF-16 brian m. carlson 2018-12-27 2:17 ` [PATCH 1/2] Documentation: document UTF-16-related behavior brian m. carlson 2018-12-27 2:17 ` [PATCH 2/2] utf8: add comment explaining why BOMs are rejected brian m. carlson @ 2018-12-27 10:06 ` Johannes Sixt 2018-12-27 16:43 ` brian m. carlson 2018-12-28 8:46 ` Ævar Arnfjörð Bjarmason 3 siblings, 1 reply; 12+ messages in thread From: Johannes Sixt @ 2018-12-27 10:06 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Lars Schneider, Torsten Bögershausen Am 27.12.18 um 03:17 schrieb brian m. carlson: > We've recently fielded several reports from unhappy Windows users about > our handling of UTF-16, UTF-16LE, and UTF-16BE, none of which seem to be > suitable for certain Windows programs. > > In an effort to communicate the reasons for our behavior more > effectively, explain in the documentation that the UTF-16 variant that > people have been asking for hasn't been standardized, and therefore > hasn't been implemented in iconv(3). Mention what each of the variants > do, so that people can make a decision which one meets their needs the > best. > > In addition, add a comment in the code about why we must, for > correctness reasons, reject a UTF-16LE or UTF-16BE sequence that begins > with U+FEFF, namely that such a codepoint semantically represents a > ZWNBSP, not a BOM, but that that codepoint at the beginning of a UTF-8 > sequence (as encoded in the object store) would be misinterpreted as a > BOM instead. > > This comment is in the code because I think it needs to be somewhere, > but I'm not sure the documentation is the right place for it. If > desired, I can add it to the documentation, although I feel the lurid > details are not interesting to most users. If the wording is confusing, > I'm very open to hearing suggestions for how to improve it. > > I don't use Windows, so I don't know what MSVCRT does. If it requires a > BOM but doesn't accept big-endian encoding, then perhaps we should > report that as a bug to Microsoft so it can be fixed in a future > version. That would probably make a lot more programs work right out of > the box and dramatically improve the user experience. It worries me that theoretical correctness is regarded higher than existing practice. I do not care a lot what some RFC tells what programs should do if the majority of the software does something different and that behavior has been proven useful in practice. My understanding is that there is no such thing as a "byte order marker". It just so happens that when the first character in some UTF-16 text file begins with a ZWNBSP, then it is possible to derive the endianness of the file automatically. Other then that, that very first code point U+FEFF *is part of the data* and must not be removed when the data is reencoded. If Git does something different, it is bogus, IMO. -- Hannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-27 10:06 ` [PATCH 0/2] Improve documentation on UTF-16 Johannes Sixt @ 2018-12-27 16:43 ` brian m. carlson 2018-12-27 19:55 ` Johannes Sixt 0 siblings, 1 reply; 12+ messages in thread From: brian m. carlson @ 2018-12-27 16:43 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Lars Schneider, Torsten Bögershausen [-- Attachment #1: Type: text/plain, Size: 2629 bytes --] On Thu, Dec 27, 2018 at 11:06:17AM +0100, Johannes Sixt wrote: > It worries me that theoretical correctness is regarded higher than existing > practice. I do not care a lot what some RFC tells what programs should do if > the majority of the software does something different and that behavior has > been proven useful in practice. The majority of OSes produce the behavior I document here, and they are the majority of systems on the Internet. Windows is the outlier here, although a significant one. It is a common user of UTF-16 and its variants, but so are Java and JavaScript, and they're present on a lot of devices. Swallowing the U+FEFF would break compatibility with those systems. The issue that Windows users are seeing is that libiconv always produces big-endian data for UTF-16, and they always want little-endian. glibc produces native-endian data, which is what Windows users want. Git for Windows could patch libiconv to do that (and that is the simple, five-minute solution to this problem), but we'd still want to warn people that they're relying on unspecified behavior, hence this series. I would even be willing to patch Git for Windows's libiconv if somebody could point me to the repo (although I obviously cannot test it, not being a Windows user). I feel strongly, though, that fixing this is outside of the scope of Git proper, and it's not a thing we should be handling here. > My understanding is that there is no such thing as a "byte order marker". It > just so happens that when the first character in some UTF-16 text file > begins with a ZWNBSP, then it is possible to derive the endianness of the > file automatically. Other then that, that very first code point U+FEFF *is > part of the data* and must not be removed when the data is reencoded. If Git > does something different, it is bogus, IMO. You've got part of this. For UTF-16LE and UTF-16BE, a U+FEFF is part of the text, as would a second one be if we had two at the beginning of a UTF-16 or UTF-8 sequence. If someone produces UTF-16LE and places a U+FEFF at the beginning of it, when we encode to UTF-8, we emit only one U+FEFF, which has the wrong semantics. To be correct here and accept a U+FEFF, we'd need to check for a U+FEFF at the beginning of a UTF-16LE or UTF-16BE sequence and ensure we encode an extra U+FEFF at the beginning of the UTF-8 data (one for BOM and one for the text) and then strip it off when we decode. That's kind of ugly, and since iconv doesn't do that itself, we'd have to. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-27 16:43 ` brian m. carlson @ 2018-12-27 19:55 ` Johannes Sixt 2018-12-27 23:45 ` brian m. carlson 0 siblings, 1 reply; 12+ messages in thread From: Johannes Sixt @ 2018-12-27 19:55 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Lars Schneider, Torsten Bögershausen Am 27.12.18 um 17:43 schrieb brian m. carlson: > On Thu, Dec 27, 2018 at 11:06:17AM +0100, Johannes Sixt wrote: >> It worries me that theoretical correctness is regarded higher than existing >> practice. I do not care a lot what some RFC tells what programs should do if >> the majority of the software does something different and that behavior has >> been proven useful in practice. > > The majority of OSes produce the behavior I document here, and they are > the majority of systems on the Internet. Windows is the outlier here, > although a significant one. It is a common user of UTF-16 and its > variants, but so are Java and JavaScript, and they're present on a lot > of devices. Swallowing the U+FEFF would break compatibility with those > systems. > > The issue that Windows users are seeing is that libiconv always produces > big-endian data for UTF-16, and they always want little-endian. glibc > produces native-endian data, which is what Windows users want. Git for > Windows could patch libiconv to do that (and that is the simple, > five-minute solution to this problem), but we'd still want to warn > people that they're relying on unspecified behavior, hence this series. > > I would even be willing to patch Git for Windows's libiconv if somebody > could point me to the repo (although I obviously cannot test it, not > being a Windows user). I feel strongly, though, that fixing this is > outside of the scope of Git proper, and it's not a thing we should be > handling here. Please appologize that I leave the majority of what you said uncommented as I am not deep in the matter and don't have a firm understanding of all the issues. I'll just trust what you said is sound. Just one thing: Please do the count by *users* (or existing files or number of charactes exchanged or something similar); do not just count OSs; I mean, Windows is *not* the outlier if it handles 90% of the UTF-16 data in the world. (I'm just making up numbers here, but I think you get the point.) >> My understanding is that there is no such thing as a "byte order marker". It >> just so happens that when the first character in some UTF-16 text file >> begins with a ZWNBSP, then it is possible to derive the endianness of the >> file automatically. Other then that, that very first code point U+FEFF *is >> part of the data* and must not be removed when the data is reencoded. If Git >> does something different, it is bogus, IMO. > > You've got part of this. For UTF-16LE and UTF-16BE, a U+FEFF is part of > the text, as would a second one be if we had two at the beginning of a > UTF-16 or UTF-8 sequence. If someone produces UTF-16LE and places a > U+FEFF at the beginning of it, when we encode to UTF-8, we emit only one > U+FEFF, which has the wrong semantics. > > To be correct here and accept a U+FEFF, we'd need to check for a U+FEFF > at the beginning of a UTF-16LE or UTF-16BE sequence and ensure we encode > an extra U+FEFF at the beginning of the UTF-8 data (one for BOM and one > for the text) and then strip it off when we decode. That's kind of ugly, > and since iconv doesn't do that itself, we'd have to. But why do you add another U+FEFF on the way to UTF-8? There is one in the incoming UTF-16 data, and only *that* one must be converted. If there is no U+FEFF in the UTF-16 data, the should not be one in UTF-8, either. Puzzled... -- Hannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-27 19:55 ` Johannes Sixt @ 2018-12-27 23:45 ` brian m. carlson 2018-12-28 8:59 ` Johannes Sixt 0 siblings, 1 reply; 12+ messages in thread From: brian m. carlson @ 2018-12-27 23:45 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Lars Schneider, Torsten Bögershausen [-- Attachment #1: Type: text/plain, Size: 3455 bytes --] On Thu, Dec 27, 2018 at 08:55:27PM +0100, Johannes Sixt wrote: > Am 27.12.18 um 17:43 schrieb brian m. carlson: > > You've got part of this. For UTF-16LE and UTF-16BE, a U+FEFF is part of > > the text, as would a second one be if we had two at the beginning of a > > UTF-16 or UTF-8 sequence. If someone produces UTF-16LE and places a > > U+FEFF at the beginning of it, when we encode to UTF-8, we emit only one > > U+FEFF, which has the wrong semantics. > > > > To be correct here and accept a U+FEFF, we'd need to check for a U+FEFF > > at the beginning of a UTF-16LE or UTF-16BE sequence and ensure we encode > > an extra U+FEFF at the beginning of the UTF-8 data (one for BOM and one > > for the text) and then strip it off when we decode. That's kind of ugly, > > and since iconv doesn't do that itself, we'd have to. > > But why do you add another U+FEFF on the way to UTF-8? There is one in the > incoming UTF-16 data, and only *that* one must be converted. If there is no > U+FEFF in the UTF-16 data, the should not be one in UTF-8, either. > Puzzled... So for UTF-16, there must be a BOM. For UTF-16LE and UTF-16BE, there must not be a BOM. So if we do this: $ printf '\xfe\xff\x00\x0a' | iconv -f UTF-16BE -t UTF-16 | xxd -g1 00000000: ff fe ff fe 0a 00 ...... That U+FEFF we have in the input is part of the text as a ZWNBSP; it is not a BOM. We end up with two U+FEFF values. The first is the BOM that's required as part of UTF-16. The second is semantically part of the text and has the semantics of a zero-width non-breaking space. In UTF-8, if the sequence starts with U+FEFF, it has the semantics of a BOM just like in UTF-16 (except that it's optional): it's not part of the text, and should be stripped off. So when we receive a UTF-16LE or UTF-16BE sequence and it contains a U+FEFF (which is part of the text), we need to insert a BOM in front of the sequence that's part of the text to keep the semantics. Essentially, we have this situation: Text (in memory): U+FEFF U+000A Semantics of text: ZWNBSP NL UTF-16BE: FE FF 00 0A Semantics: ZWNBSP NL UTF-16: FE FF FE FF 00 0A Semantics: BOM ZWNBSP NL UTF-8: EF BB BF EF BB BF 0A Semantics: BOM ZWNBSP NL If you don't have a U+FEFF, then things can be simpler: Text (in memory): U+0041 U+0042 U+0043 Semantics of text: A B C UTF-16BE: 00 41 00 42 00 43 Semantics: A B C UTF-16: FE FF 00 41 00 42 00 43 Semantics: BOM A B C UTF-8: 41 42 43 Semantics: A B C UTF-8 (optional): EF BB BF 41 42 43 Semantics: BOM A B C (I have picked big-endian UTF-16 here, but little-endian is fine, too; this is just easier for me to type.) This is all a huge edge case involving correctly serializing code points. By rejecting U=FEFF in UTF-16BE and UTF-16LE, we don't have to deal with any of it. As mentioned, I think patching Git for Windows's iconv is the smallest, most achievable solution to this, because it means we don't have to handle any of this edge case ourselves. Windows and WSL users can both write "UTF-16" and get a BOM and little-endian behavior, while we can delegate all the rest of the encoding stuff to libiconv. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-27 23:45 ` brian m. carlson @ 2018-12-28 8:59 ` Johannes Sixt 2018-12-28 20:31 ` Philip Oakley 0 siblings, 1 reply; 12+ messages in thread From: Johannes Sixt @ 2018-12-28 8:59 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Lars Schneider, Torsten Bögershausen Am 28.12.18 um 00:45 schrieb brian m. carlson: > On Thu, Dec 27, 2018 at 08:55:27PM +0100, Johannes Sixt wrote: >> But why do you add another U+FEFF on the way to UTF-8? There is one in the >> incoming UTF-16 data, and only *that* one must be converted. If there is no >> U+FEFF in the UTF-16 data, the should not be one in UTF-8, either. >> Puzzled... > > So for UTF-16, there must be a BOM. For UTF-16LE and UTF-16BE, there > must not be a BOM. So if we do this: > > $ printf '\xfe\xff\x00\x0a' | iconv -f UTF-16BE -t UTF-16 | xxd -g1 > 00000000: ff fe ff fe 0a 00 ...... What sort of braindamage is this? Fix iconv. But as I said, I'm not an expert. I just vented my worries that widespread existing practice would be ignored under the excuse "you are the outlier". -- Hannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-28 8:59 ` Johannes Sixt @ 2018-12-28 20:31 ` Philip Oakley 0 siblings, 0 replies; 12+ messages in thread From: Philip Oakley @ 2018-12-28 20:31 UTC (permalink / raw) To: Johannes Sixt, brian m. carlson Cc: git, Lars Schneider, Torsten Bögershausen On 28/12/2018 08:59, Johannes Sixt wrote: > Am 28.12.18 um 00:45 schrieb brian m. carlson: >> On Thu, Dec 27, 2018 at 08:55:27PM +0100, Johannes Sixt wrote: >>> But why do you add another U+FEFF on the way to UTF-8? There is one >>> in the >>> incoming UTF-16 data, and only *that* one must be converted. If >>> there is no >>> U+FEFF in the UTF-16 data, the should not be one in UTF-8, either. >>> Puzzled... >> >> So for UTF-16, there must be a BOM. For UTF-16LE and UTF-16BE, there >> must not be a BOM. So if we do this: >> >> $ printf '\xfe\xff\x00\x0a' | iconv -f UTF-16BE -t UTF-16 | xxd -g1 >> 00000000: ff fe ff fe 0a 00 ...... > > What sort of braindamage is this? Fix iconv. > > But as I said, I'm not an expert. I just vented my worries that > widespread existing practice would be ignored under the excuse "you > are the outlier". > > -- Hannes For ref, I dug out a Microsoft document [1] on its view of BOMs which can be compared to the ref [0] Brian gave [1] https://docs.microsoft.com/en-us/windows/desktop/intl/using-byte-order-marks [0] https://unicode.org/faq/utf_bom.html#bom9 Maybe the documentation patch ([PATCH 1/2] Documentation: document UTF-16-related behavior) should include the line ", because we encode into UTF-8 internally,", and a link to ref [0], and maybe [1] Whether the various Windows programs actually follow the Microsoft convention is another matter altogether . -- Philip ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-27 2:17 [PATCH 0/2] Improve documentation on UTF-16 brian m. carlson ` (2 preceding siblings ...) 2018-12-27 10:06 ` [PATCH 0/2] Improve documentation on UTF-16 Johannes Sixt @ 2018-12-28 8:46 ` Ævar Arnfjörð Bjarmason 2018-12-28 20:35 ` Philip Oakley 2018-12-29 23:17 ` brian m. carlson 3 siblings, 2 replies; 12+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-12-28 8:46 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Lars Schneider, Torsten Bögershausen On Thu, Dec 27 2018, brian m. carlson wrote: > We've recently fielded several reports from unhappy Windows users about > our handling of UTF-16, UTF-16LE, and UTF-16BE, none of which seem to be > suitable for certain Windows programs. Just for context, is "we" here $DAYJOB or a reference to some previous ML thread(s) on this list, or something else? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-28 8:46 ` Ævar Arnfjörð Bjarmason @ 2018-12-28 20:35 ` Philip Oakley 2018-12-29 23:17 ` brian m. carlson 1 sibling, 0 replies; 12+ messages in thread From: Philip Oakley @ 2018-12-28 20:35 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, brian m. carlson Cc: git, Lars Schneider, Torsten Bögershausen On 28/12/2018 08:46, Ævar Arnfjörð Bjarmason wrote: > On Thu, Dec 27 2018, brian m. carlson wrote: > >> We've recently fielded several reports from unhappy Windows users about >> our handling of UTF-16, UTF-16LE, and UTF-16BE, none of which seem to be >> suitable for certain Windows programs. > Just for context, is "we" here $DAYJOB or a reference to some previous > ML thread(s) on this list, or something else? I think https://public-inbox.org/git/CADN+U_PUfnYWb-wW6drRANv-ZaYBEk3gWHc7oJtxohA5Vc3NEg@mail.gmail.com/ was the most recent on the Git list. -- Philip ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Improve documentation on UTF-16 2018-12-28 8:46 ` Ævar Arnfjörð Bjarmason 2018-12-28 20:35 ` Philip Oakley @ 2018-12-29 23:17 ` brian m. carlson 1 sibling, 0 replies; 12+ messages in thread From: brian m. carlson @ 2018-12-29 23:17 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Lars Schneider, Torsten Bögershausen [-- Attachment #1: Type: text/plain, Size: 644 bytes --] On Fri, Dec 28, 2018 at 09:46:18AM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Dec 27 2018, brian m. carlson wrote: > > > We've recently fielded several reports from unhappy Windows users about > > our handling of UTF-16, UTF-16LE, and UTF-16BE, none of which seem to be > > suitable for certain Windows programs. > > Just for context, is "we" here $DAYJOB or a reference to some previous > ML thread(s) on this list, or something else? "We" in this case is the Git list. I think the list has seen at least three threads in recent months. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-12-29 23:18 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-27 2:17 [PATCH 0/2] Improve documentation on UTF-16 brian m. carlson 2018-12-27 2:17 ` [PATCH 1/2] Documentation: document UTF-16-related behavior brian m. carlson 2018-12-27 2:17 ` [PATCH 2/2] utf8: add comment explaining why BOMs are rejected brian m. carlson 2018-12-27 10:06 ` [PATCH 0/2] Improve documentation on UTF-16 Johannes Sixt 2018-12-27 16:43 ` brian m. carlson 2018-12-27 19:55 ` Johannes Sixt 2018-12-27 23:45 ` brian m. carlson 2018-12-28 8:59 ` Johannes Sixt 2018-12-28 20:31 ` Philip Oakley 2018-12-28 8:46 ` Ævar Arnfjörð Bjarmason 2018-12-28 20:35 ` Philip Oakley 2018-12-29 23:17 ` brian m. carlson
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).