git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* t0028-working-tree-encoding.sh test #3 data
@ 2019-03-09 14:36 Jeffrey Walton
  2019-03-09 16:10 ` Torsten Bögershausen
  0 siblings, 1 reply; 4+ messages in thread
From: Jeffrey Walton @ 2019-03-09 14:36 UTC (permalink / raw)
  To: Git List

Hi Everyone,

I'm experiencing a failure in t0028-working-tree-encoding.sh. The
first failure is test #3. The source states "source (test.utf16lebom,
considered UTF-16LE-BOM)" but it looks like a UTF16-LE BOM followed by
a UTF32-LE stream.

Am I misunderstanding the data presentation?

$ ./t0028-working-tree-encoding.sh -v -i
...
ok 2 - ensure UTF-8 is stored in Git

expecting success:
        test_when_finished "rm -f test.utf16.raw" &&

        rm test.utf16 &&
        git checkout test.utf16 &&
        test_cmp_bin test.utf16.raw test.utf16

Updated 1 path from the index
source (test.utf16lebom, considered UTF-16LE-BOM):
|  0: ff   |  1: fe   |  2: 68 h |  3:  0   |  4:  0   |  5:  0   |
6: 61 a |  7:  0
|  8:  0   |  9:  0   | 10: 6c l | 11:  0   | 12:  0   | 13:  0   |
14: 6c l | 15:  0
| 16:  0   | 17:  0   | 18: 6f o | 19:  0   | 20:  0   | 21:  0   |
22: 20   | 23:  0
| 24:  0   | 25:  0   | 26: 74 t | 27:  0   | 28:  0   | 29:  0   |
30: 68 h | 31:  0
| 32:  0   | 33:  0   | 34: 65 e | 35:  0   | 36:  0   | 37:  0   |
38: 72 r | 39:  0
| 40:  0   | 41:  0   | 42: 65 e | 43:  0   | 44:  0   | 45:  0   |
46: 21 ! | 47:  0
| 48:  0   | 49:  0   | 50:  a   | 51:  0   | 52:  0   | 53:  0   |
54: 63 c | 55:  0
| 56:  0   | 57:  0   | 58: 61 a | 59:  0   | 60:  0   | 61:  0   |
62: 6e n | 63:  0
| 64:  0   | 65:  0   | 66: 20   | 67:  0   | 68:  0   | 69:  0   |
70: 79 y | 71:  0
| 72:  0   | 73:  0   | 74: 6f o | 75:  0   | 76:  0   | 77:  0   |
78: 75 u | 79:  0
| 80:  0   | 81:  0   | 82: 20   | 83:  0   | 84:  0   | 85:  0   |
86: 72 r | 87:  0
| 88:  0   | 89:  0   | 90: 65 e | 91:  0   | 92:  0   | 93:  0   |
94: 61 a | 95:  0
| 96:  0   | 97:  0   | 98: 64 d | 99:  0   | 100:  0   | 101:  0   |
102: 20   | 103:  0
| 104:  0   | 105:  0   | 106: 6d m | 107:  0   | 108:  0   | 109:  0
 | 110: 65 e | 111:  0
| 112:  0   | 113:  0   | 114: 3f ? | 115:  0   | 116:  0   | 117:  0

destination (test.utf16lebom, considered UTF-8):
|  0: 68 h |  1:  0   |  2: 61 a |  3:  0   |  4: 6c l |  5:  0   |
6: 6c l |  7:  0
|  8: 6f o |  9:  0   | 10: 20   | 11:  0   | 12: 74 t | 13:  0   |
14: 68 h | 15:  0
| 16: 65 e | 17:  0   | 18: 72 r | 19:  0   | 20: 65 e | 21:  0   |
22: 21 ! | 23:  0
| 24:  a   | 25:  0   | 26: 63 c | 27:  0   | 28: 61 a | 29:  0   |
30: 6e n | 31:  0
| 32: 20   | 33:  0   | 34: 79 y | 35:  0   | 36: 6f o | 37:  0   |
38: 75 u | 39:  0
| 40: 20   | 41:  0   | 42: 72 r | 43:  0   | 44: 65 e | 45:  0   |
46: 61 a | 47:  0
| 48: 64 d | 49:  0   | 50: 20   | 51:  0   | 52: 6d m | 53:  0   |
54: 65 e | 55:  0
| 56: 3f ? | 57:  0

test.utf16.raw test.utf16 differ: char 1, line 1
not ok 3 - re-encode to UTF-16 on checkout
#
#               test_when_finished "rm -f test.utf16.raw" &&
#
#               rm test.utf16 &&
#               git checkout test.utf16 &&
#               test_cmp_bin test.utf16.raw test.utf16
#

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: t0028-working-tree-encoding.sh test #3 data
  2019-03-09 14:36 t0028-working-tree-encoding.sh test #3 data Jeffrey Walton
@ 2019-03-09 16:10 ` Torsten Bögershausen
  2019-03-10  1:57   ` Jeffrey Walton
  0 siblings, 1 reply; 4+ messages in thread
From: Torsten Bögershausen @ 2019-03-09 16:10 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: Git List

On Sat, Mar 09, 2019 at 09:36:34AM -0500, Jeffrey Walton wrote:
> Hi Everyone,
>
> I'm experiencing a failure in t0028-working-tree-encoding.sh. The
> first failure is test #3. The source states "source (test.utf16lebom,
> considered UTF-16LE-BOM)" but it looks like a UTF16-LE BOM followed by
> a UTF32-LE stream.
>
> Am I misunderstanding the data presentation?

Thanks for the report.

I think you understand it right.

May be you can help us: Which OS are you using ?

And what does
echo "hallo" | iconv -f UTF-8 -t UTF-16 | xxd
give ?

We may need some more debugging, may be you can send the whole log file ?
Even if there is are a lot of ESC-sequences...



>
> $ ./t0028-working-tree-encoding.sh -v -i
> ...
> ok 2 - ensure UTF-8 is stored in Git
>
> expecting success:
>         test_when_finished "rm -f test.utf16.raw" &&
>
>         rm test.utf16 &&
>         git checkout test.utf16 &&
>         test_cmp_bin test.utf16.raw test.utf16
>
> Updated 1 path from the index
> source (test.utf16lebom, considered UTF-16LE-BOM):
> |  0: ff   |  1: fe   |  2: 68 h |  3:  0   |  4:  0   |  5:  0   |
> 6: 61 a |  7:  0
> |  8:  0   |  9:  0   | 10: 6c l | 11:  0   | 12:  0   | 13:  0   |
> 14: 6c l | 15:  0
> | 16:  0   | 17:  0   | 18: 6f o | 19:  0   | 20:  0   | 21:  0   |
> 22: 20   | 23:  0
> | 24:  0   | 25:  0   | 26: 74 t | 27:  0   | 28:  0   | 29:  0   |
> 30: 68 h | 31:  0
> | 32:  0   | 33:  0   | 34: 65 e | 35:  0   | 36:  0   | 37:  0   |
> 38: 72 r | 39:  0
> | 40:  0   | 41:  0   | 42: 65 e | 43:  0   | 44:  0   | 45:  0   |
> 46: 21 ! | 47:  0
> | 48:  0   | 49:  0   | 50:  a   | 51:  0   | 52:  0   | 53:  0   |
> 54: 63 c | 55:  0
> | 56:  0   | 57:  0   | 58: 61 a | 59:  0   | 60:  0   | 61:  0   |
> 62: 6e n | 63:  0
> | 64:  0   | 65:  0   | 66: 20   | 67:  0   | 68:  0   | 69:  0   |
> 70: 79 y | 71:  0
> | 72:  0   | 73:  0   | 74: 6f o | 75:  0   | 76:  0   | 77:  0   |
> 78: 75 u | 79:  0
> | 80:  0   | 81:  0   | 82: 20   | 83:  0   | 84:  0   | 85:  0   |
> 86: 72 r | 87:  0
> | 88:  0   | 89:  0   | 90: 65 e | 91:  0   | 92:  0   | 93:  0   |
> 94: 61 a | 95:  0
> | 96:  0   | 97:  0   | 98: 64 d | 99:  0   | 100:  0   | 101:  0   |
> 102: 20   | 103:  0
> | 104:  0   | 105:  0   | 106: 6d m | 107:  0   | 108:  0   | 109:  0
>  | 110: 65 e | 111:  0
> | 112:  0   | 113:  0   | 114: 3f ? | 115:  0   | 116:  0   | 117:  0
>
> destination (test.utf16lebom, considered UTF-8):
> |  0: 68 h |  1:  0   |  2: 61 a |  3:  0   |  4: 6c l |  5:  0   |
> 6: 6c l |  7:  0
> |  8: 6f o |  9:  0   | 10: 20   | 11:  0   | 12: 74 t | 13:  0   |
> 14: 68 h | 15:  0
> | 16: 65 e | 17:  0   | 18: 72 r | 19:  0   | 20: 65 e | 21:  0   |
> 22: 21 ! | 23:  0
> | 24:  a   | 25:  0   | 26: 63 c | 27:  0   | 28: 61 a | 29:  0   |
> 30: 6e n | 31:  0
> | 32: 20   | 33:  0   | 34: 79 y | 35:  0   | 36: 6f o | 37:  0   |
> 38: 75 u | 39:  0
> | 40: 20   | 41:  0   | 42: 72 r | 43:  0   | 44: 65 e | 45:  0   |
> 46: 61 a | 47:  0
> | 48: 64 d | 49:  0   | 50: 20   | 51:  0   | 52: 6d m | 53:  0   |
> 54: 65 e | 55:  0
> | 56: 3f ? | 57:  0
>
> test.utf16.raw test.utf16 differ: char 1, line 1
> not ok 3 - re-encode to UTF-16 on checkout
> #
> #               test_when_finished "rm -f test.utf16.raw" &&
> #
> #               rm test.utf16 &&
> #               git checkout test.utf16 &&
> #               test_cmp_bin test.utf16.raw test.utf16
> #

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: t0028-working-tree-encoding.sh test #3 data
  2019-03-09 16:10 ` Torsten Bögershausen
@ 2019-03-10  1:57   ` Jeffrey Walton
  2019-03-10  6:33     ` Torsten Bögershausen
  0 siblings, 1 reply; 4+ messages in thread
From: Jeffrey Walton @ 2019-03-10  1:57 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git List

On Sat, Mar 9, 2019 at 11:10 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Sat, Mar 09, 2019 at 09:36:34AM -0500, Jeffrey Walton wrote:
> >
> > I'm experiencing a failure in t0028-working-tree-encoding.sh. The
> > first failure is test #3. The source states "source (test.utf16lebom,
> > considered UTF-16LE-BOM)" but it looks like a UTF16-LE BOM followed by
> > a UTF32-LE stream.
> >
> > Am I misunderstanding the data presentation?
>
> Thanks for the report.
>
> I think you understand it right.
>
> May be you can help us: Which OS are you using ?

Fedora 29, x86_64 fully patched.

However, I'm building Git and all of its dependencies with additional
flags for testing. The prefix directory is /var/tmp and the lib
directory is /var/tmp/lib64.

RPATHS are set for everything being built, but I don't rule out those
stupid path problems that plague Linux. In the past I have seen grep
and awk from /bin use special builds of libraries in /var/tmp/lib64. I
have not figured out how to tell programs in /bin to stop using test
libraries in /var/tmp/lib64.

> And what does
> echo "hallo" | iconv -f UTF-8 -t UTF-16 | xxd
> give ?

$ PATH=/var/tmp/bin/:$PATH echo "hallo" | iconv -f UTF-8 -t UTF-16 | xxd
00000000: fffe 6800 6100 6c00 6c00 6f00 0a00       ..h.a.l.l.o...

And:

$ PATH=/var/tmp/bin/:$PATH echo "hallo" | /usr/bin/iconv -f UTF-8 -t
UTF-16 | xxd
00000000: fffe 6800 6100 6c00 6c00 6f00 0a00       ..h.a.l.l.o...

> We may need some more debugging, may be you can send the whole log file ?
> Even if there is are a lot of ESC-sequences...

Yes, absolutely. Which file would you like?

(The only thing I can find is config.log).

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: t0028-working-tree-encoding.sh test #3 data
  2019-03-10  1:57   ` Jeffrey Walton
@ 2019-03-10  6:33     ` Torsten Bögershausen
  0 siblings, 0 replies; 4+ messages in thread
From: Torsten Bögershausen @ 2019-03-10  6:33 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: Git List

On Sat, Mar 09, 2019 at 08:57:07PM -0500, Jeffrey Walton wrote:
> On Sat, Mar 9, 2019 at 11:10 AM Torsten Bögershausen <tboegi@web.de> wrote:
> >
> > On Sat, Mar 09, 2019 at 09:36:34AM -0500, Jeffrey Walton wrote:
> > >
> > > I'm experiencing a failure in t0028-working-tree-encoding.sh. The
> > > first failure is test #3. The source states "source (test.utf16lebom,
> > > considered UTF-16LE-BOM)" but it looks like a UTF16-LE BOM followed by
> > > a UTF32-LE stream.
> > >
> > > Am I misunderstanding the data presentation?
> >
> > Thanks for the report.
> >
> > I think you understand it right.
> >
> > May be you can help us: Which OS are you using ?
>
> Fedora 29, x86_64 fully patched.
>
> However, I'm building Git and all of its dependencies with additional
> flags for testing. The prefix directory is /var/tmp and the lib
> directory is /var/tmp/lib64.
>
> RPATHS are set for everything being built, but I don't rule out those
> stupid path problems that plague Linux. In the past I have seen grep
> and awk from /bin use special builds of libraries in /var/tmp/lib64. I
> have not figured out how to tell programs in /bin to stop using test
> libraries in /var/tmp/lib64.
>
> > And what does
> > echo "hallo" | iconv -f UTF-8 -t UTF-16 | xxd
> > give ?
>
> $ PATH=/var/tmp/bin/:$PATH echo "hallo" | iconv -f UTF-8 -t UTF-16 | xxd
> 00000000: fffe 6800 6100 6c00 6c00 6f00 0a00       ..h.a.l.l.o...
>
> And:
>
> $ PATH=/var/tmp/bin/:$PATH echo "hallo" | /usr/bin/iconv -f UTF-8 -t
> UTF-16 | xxd
> 00000000: fffe 6800 6100 6c00 6c00 6f00 0a00       ..h.a.l.l.o...
>
> > We may need some more debugging, may be you can send the whole log file ?
> > Even if there is are a lot of ESC-sequences...
>
> Yes, absolutely. Which file would you like?
>
> (The only thing I can find is config.log).
>
> Jeff

(Which version of Git are you using ?
I assume that you use the latest master ?)

Sorry being unclear about the log file.
The idea is to run t0028 from the t/ directory like this:
 ./t0028-working-tree-encoding.sh -v 2>&1 | tee t28.txt
 And send the t28.txt to us.

 But before doing that, it may be useful to patch convert.c to remove
 the "ANSI sequences" and make the log file easier to read.
In convert.c, remove the  "\033[2m%2" and "\033[2m%0" stuff in trace_encoding().
(Or use the patch I just send out).

My version looks like this (shortened, you say you see a breakage in test #3 ?


Initialized empty Git repository in XXX/t/trash directory.t0028-working-tree-encoding/.git/
expecting success:
	git config core.eol lf &&

	text="hallo there!\ncan you read me?" &&
	echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes &&
	echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes &&
	printf "$text" >test.utf8.raw &&
	printf "$text" | write_utf16 >test.utf16.raw &&
	printf "$text" | write_utf32 >test.utf32.raw &&
	printf "\377\376"                         >test.utf16lebom.raw &&
	printf "$text" | iconv -f UTF-8 -t UTF-32LE >>test.utf16lebom.raw &&

	# Line ending tests
	printf "one\ntwo\nthree\n" >lf.utf8.raw &&
	printf "one\r\ntwo\r\nthree\r\n" >crlf.utf8.raw &&

	# BOM tests
	printf "\0a\0b\0c"                         >nobom.utf16be.raw &&
	printf "a\0b\0c\0"                         >nobom.utf16le.raw &&
	printf "\376\377\0a\0b\0c"                 >bebom.utf16be.raw &&
	printf "\377\376a\0b\0c\0"                 >lebom.utf16le.raw &&
	printf "\0\0\0a\0\0\0b\0\0\0c"             >nobom.utf32be.raw &&
	printf "a\0\0\0b\0\0\0c\0\0\0"             >nobom.utf32le.raw &&
	printf "\0\0\376\377\0\0\0a\0\0\0b\0\0\0c" >bebom.utf32be.raw &&
	printf "\377\376\0\0a\0\0\0b\0\0\0c\0\0\0" >lebom.utf32le.raw &&

	# Add only UTF-16 file, we will add the UTF-32 file later
	cp test.utf16.raw test.utf16 &&
	cp test.utf32.raw test.utf32 &&
	cp test.utf16lebom.raw test.utf16lebom &&
	git add .gitattributes test.utf16 test.utf16lebom &&
	git commit -m initial

checking prerequisite: NO_UTF16_BOM

mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
(
	cd "$TRASH_DIRECTORY/prereq-test-dir" &&
	test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6

)
prerequisite NO_UTF16_BOM not satisfied
checking prerequisite: NO_UTF32_BOM

mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
(
	cd "$TRASH_DIRECTORY/prereq-test-dir" &&
	test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12

)
prerequisite NO_UTF32_BOM not satisfied
source (test.utf16, considered utf-16):
|  0: ff   |  1: fe   |  2: 68 h |  3:  0   |  4: 61 a |  5:  0   |  6: 6c l |  7:  0
|  8: 6c l |  9:  0   | 10: 6f o | 11:  0   | 12: 20   | 13:  0   | 14: 74 t | 15:  0
| 16: 68 h | 17:  0   | 18: 65 e | 19:  0   | 20: 72 r | 21:  0   | 22: 65 e | 23:  0
| 24: 21 ! | 25:  0   | 26:  a   | 27:  0   | 28: 63 c | 29:  0   | 30: 61 a | 31:  0
| 32: 6e n | 33:  0   | 34: 20   | 35:  0   | 36: 79 y | 37:  0   | 38: 6f o | 39:  0
| 40: 75 u | 41:  0   | 42: 20   | 43:  0   | 44: 72 r | 45:  0   | 46: 65 e | 47:  0
| 48: 61 a | 49:  0   | 50: 64 d | 51:  0   | 52: 20   | 53:  0   | 54: 6d m | 55:  0
| 56: 65 e | 57:  0   | 58: 3f ? | 59:  0

destination (test.utf16, considered UTF-8):
|  0: 68 h |  1: 61 a |  2: 6c l |  3: 6c l |  4: 6f o |  5: 20   |  6: 74 t |  7: 68 h
|  8: 65 e |  9: 72 r | 10: 65 e | 11: 21 ! | 12:  a   | 13: 63 c | 14: 61 a | 15: 6e n
| 16: 20   | 17: 79 y | 18: 6f o | 19: 75 u | 20: 20   | 21: 72 r | 22: 65 e | 23: 61 a
| 24: 64 d | 25: 20   | 26: 6d m | 27: 65 e | 28: 3f ?

source (test.utf16lebom, considered UTF-16LE-BOM):
|  0: ff   |  1: fe   |  2: 68 h |  3:  0   |  4:  0   |  5:  0   |  6: 61 a |  7:  0
|  8:  0   |  9:  0   | 10: 6c l | 11:  0   | 12:  0   | 13:  0   | 14: 6c l | 15:  0
| 16:  0   | 17:  0   | 18: 6f o | 19:  0   | 20:  0   | 21:  0   | 22: 20   | 23:  0
| 24:  0   | 25:  0   | 26: 74 t | 27:  0   | 28:  0   | 29:  0   | 30: 68 h | 31:  0
| 32:  0   | 33:  0   | 34: 65 e | 35:  0   | 36:  0   | 37:  0   | 38: 72 r | 39:  0
| 40:  0   | 41:  0   | 42: 65 e | 43:  0   | 44:  0   | 45:  0   | 46: 21 ! | 47:  0
| 48:  0   | 49:  0   | 50:  a   | 51:  0   | 52:  0   | 53:  0   | 54: 63 c | 55:  0
| 56:  0   | 57:  0   | 58: 61 a | 59:  0   | 60:  0   | 61:  0   | 62: 6e n | 63:  0
| 64:  0   | 65:  0   | 66: 20   | 67:  0   | 68:  0   | 69:  0   | 70: 79 y | 71:  0
| 72:  0   | 73:  0   | 74: 6f o | 75:  0   | 76:  0   | 77:  0   | 78: 75 u | 79:  0
| 80:  0   | 81:  0   | 82: 20   | 83:  0   | 84:  0   | 85:  0   | 86: 72 r | 87:  0
| 88:  0   | 89:  0   | 90: 65 e | 91:  0   | 92:  0   | 93:  0   | 94: 61 a | 95:  0
| 96:  0   | 97:  0   | 98: 64 d | 99:  0   | 100:  0   | 101:  0   | 102: 20   | 103:  0
| 104:  0   | 105:  0   | 106: 6d m | 107:  0   | 108:  0   | 109:  0   | 110: 65 e | 111:  0
| 112:  0   | 113:  0   | 114: 3f ? | 115:  0   | 116:  0   | 117:  0

destination (test.utf16lebom, considered UTF-8):
|  0: 68 h |  1:  0   |  2: 61 a |  3:  0   |  4: 6c l |  5:  0   |  6: 6c l |  7:  0
|  8: 6f o |  9:  0   | 10: 20   | 11:  0   | 12: 74 t | 13:  0   | 14: 68 h | 15:  0
| 16: 65 e | 17:  0   | 18: 72 r | 19:  0   | 20: 65 e | 21:  0   | 22: 21 ! | 23:  0
| 24:  a   | 25:  0   | 26: 63 c | 27:  0   | 28: 61 a | 29:  0   | 30: 6e n | 31:  0
| 32: 20   | 33:  0   | 34: 79 y | 35:  0   | 36: 6f o | 37:  0   | 38: 75 u | 39:  0
| 40: 20   | 41:  0   | 42: 72 r | 43:  0   | 44: 65 e | 45:  0   | 46: 61 a | 47:  0
| 48: 64 d | 49:  0   | 50: 20   | 51:  0   | 52: 6d m | 53:  0   | 54: 65 e | 55:  0
| 56: 3f ? | 57:  0

[master (root-commit) 275413c] initial
 Author: A U Thor <author@example.com>
 3 files changed, 4 insertions(+)
 create mode 100644 .gitattributes
 create mode 100644 test.utf16
 create mode 100644 test.utf16lebom
ok 1 - setup test files

expecting success:
	test_when_finished "rm -f test.utf16.git" &&

	git cat-file -p :test.utf16 >test.utf16.git &&
	test_cmp_bin test.utf8.raw test.utf16.git

ok 2 - ensure UTF-8 is stored in Git

expecting success:
	test_when_finished "rm -f test.utf16.raw" &&

	rm test.utf16 &&
	git checkout test.utf16 &&
	test_cmp_bin test.utf16.raw test.utf16

Updated 1 path from the index
ok 3 - re-encode to UTF-16 on checkout
[snip]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-10  6:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-09 14:36 t0028-working-tree-encoding.sh test #3 data Jeffrey Walton
2019-03-09 16:10 ` Torsten Bögershausen
2019-03-10  1:57   ` Jeffrey Walton
2019-03-10  6:33     ` Torsten Bögershausen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).