On 2020-04-18 16:12:06-0700, Junio C Hamano wrote: > Martin Ågren writes: > > > This is equivalent as long as `line->len` is equal to > > `strlen(line->buf)`, which it will be (should be) because it's a > > strbuf. Ok. > > For the guarantee to hold true, line->buf[0..line->len] should not > have any '\0' byte in it. > > This helper has two callers, but in either case, it needs to be > prepared to work on output of decode_[bq]_segment(). Is there code > anywhere that guarntees that the decoding won't stuff '\0' in the > line? - First caller: `decode_header`, we don't have such guarantee, The old code will discard everything after first NUL characters. I'm _not_ really familiar with email standard, does email allow UTF-16 (albeit in [bq]-encoded) in header? If yes, the current code is likely disallow it. The new code accidentally, accept [bq]-encoded-utf-16 header and reencode to utf-8 for commit. Yes, it's very likely only UTF-8, ISO-8859-1, some variant of ISO-2022 is used nowaday in email. *If* we would like to not exclude UTF-16, the new question is should we trust the length of newly converted utf-8 string. - Second caller: `handle_commit_msg`: everything get interesting from here. Get back to the question of trusting the length of newly converted utf-8 string. I _think_ we should, because I _think_ there shouldn't be any discrimination against any encoding (be it utf-8, ISO-8859-1, etc...), the current code allows NUL character in utf-8 [bq]-encoded string in this function (early return) and its caller, and report an error later: error: a NUL byte in commit log message not allowed. meanwhile, if the email was sent in other encoding, the current code discards everything after NUL in that line, thus silently accept broken commit message. Attached is the faulty patch in ISO-8859-1, which was used to demonstrate my words. The current code will make a commit with only "Áb" in the body, while the new code rightly claims we have a faulty email. -- Danh