On Thu, Jul 04, 2013 at 09:58:08PM +0200, Torsten Bögershausen wrote: > On 2013-07-04 19.19, brian m. carlson wrote: > > The commit code already contains code for validating UTF-8, but it does not > > check for invalid values, such as guaranteed non-characters and surrogates. Fix > s/guaranteed non-characters/code points out of range/ The "such as" is meant to be illustrative, not all-inclusive, and my patch does check for U+FFFE and U+FFFF, which are guaranteed non-characters. > > this by explicitly checking for and rejecting such characters. > Do we really reject them, or do we (only) warn about them ? Well, find_invalid_utf8 rejects them as invalid, and verify_utf8 fixes them up as if they were Latin-1, and commit_tree_extended warns about them. My interpretation was from the point of view of the code that I touched (find_invalid_utf8), not the binary. It would be nice if the binary actually rejected it, too, but that isn't within the scope of this patch. > Other question: > Now that we have a check for codepoints out of range, beyond U+10FFFF, > do we want to have an additional testcase ? Sure, why not? > > +test_expect_success 'UTF-8 invalid characters refused' ' > May be: > test_expect_success 'UTF-8 invalid surrogate' ' Since I'll be adding at least one more unit test, as you requested, I'll change the name. I suppose I might as well add a test for the non-characters as well. > Does it make sense to "grep on the fly", like this: > git commit -a -F "$HOME/invalid" 2>&1 | grep "did not conform" I am interested in making sure that git commit succeeds, and using a pipe will cause any failure of git commit to be ignored. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187