git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Jeremy Linton <lintonrjeremy@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH] packfile: Correct zlib buffer handling
Date: Mon, 28 May 2018 11:41:15 +0900
Message-ID: <xmqqk1rolcxg.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <xmqqsh6dl3gg.fsf@gitster-ct.c.googlers.com>

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Sun, May 27, 2018 at 1:57 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Duy Nguyen <pclouds@gmail.com> writes:
>>>
>>>> On Sat, May 26, 2018 at 12:56 AM, Jeremy Linton <lintonrjeremy@gmail.com> wrote:
>>>>> @@ -1416,7 +1416,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
>>>>>                 return NULL;
>>>>>         memset(&stream, 0, sizeof(stream));
>>>>>         stream.next_out = buffer;
>>>>> -       stream.avail_out = size + 1;
>>>>> +       stream.avail_out = size;
>>>>
>>>> You may want to include in your commit message a reference to
>>>> 39eea7bdd9 (Fix incorrect error check while reading deflated pack data
>>>> - 2009-10-21) which adds this plus one with a fascinating story
>>>> behind.
>>>
>>> A bit puzzled---are you saying that this recent patch breaks the old
>>> fix and must be done in some other way?
>>
>> No. I actually wanted to answer that question when I tried to track
>> down the commit that adds " + 1" but I did not spend enough time to
>> understand the old problem. I guess your puzzle means you didn't think
>> it would break anything, which is good.
>
> No it merely means I am puzzled how the posted patch that goes
> directly opposite to what an earlier "fix" did is a correct solution
> to anything X-<.

Specifically, I was worried about this assertion:

    Lets rely on the fact that the source buffer will only be fully
    consumed when the when the destination buffer is inflated to the
    correct size.

which I think is the exact bad thinking that caused troubles for us
in the past; isn't the explanation in 456cdf6e ("Fix loose object
uncompression check.", 2007-03-19) relevant here?

-	stream.avail_out = size + 1;
+	stream.avail_out = size;
	...
 		stream.next_in = in;
 		st = git_inflate(&stream, Z_FINISH);
 		if (!stream.avail_out)
-			break; /* the payload is larger than it should be */
+			break; /* done, st indicates if source fully consumed */
 		curpos += stream.next_in - in;
 	} while (st == Z_OK || st == Z_BUF_ERROR);
 	git_inflate_end(&stream);
 	if ((st != Z_STREAM_END) || stream.total_out != size) {
 		free(buffer);
 		return NULL;
 	}

With minimum stream.avail_out without slack, when !avail_out, i.e.
when we fully filled the output buffer, it could be that we had
correct input that deflates to the correct size, in which case we
are happy---st would say Z_STREAM_END, we would leave the loop
because it is neither OK nor BUF_ERROR, and total_out would report
the size we expected.  Or the input zlib stream may have ended with
bytes that express "this concludes the stream", and the input bytes
before that was sufficient to construct the original payload fully,
and we may have just fed the bytes before that "this concludes the
stream" to git_inflate().

In such a case, we haven't consumed all the avail_in.  We may
already have all the correct output, i.e. !avail_out, but because we
haven't consumed the "this concludes the stream", st is not
STREAM_END in such a case.  

Our existing while() loop, with one-byte slack in avail_out, would
have let us continue and the next iteration of the loop would have
consumed the input without producing any more output (i.e. avail_out
would have been left to 1 in both of these final two rounds) and we
would have exited the loop.  After calling inflate_end(), we would
have noticed STREAM_END and correct size and we would have been
happy.

The updated code would handle this latter case rather badly, no?  We
leave the loop early, notice st is not STREAM_END, and be very
unhappy, because this patch did not give us to consume the very end
of the input stream and left the loop early.

>> This yields two problems, first a single byte overrun won't be detected
>> properly because the Z_STREAM_END will then be set, but the null
>> terminator will have been overwritten.

Because we compare total_out and size at the end, we would detect it
as an error in this function, no?  Then zlib overwriting NUL would
not be a problem, as we would free the buffer and return NULL, no?

>> The other problem is that
>> more recent zlib patches have been poisoning the unconsumed portions
>> of the buffers which also overwrites the null, while correctly
>> returning length and status.

Isn't that a bug in zlib, though?  Or do they do that deliberately?

I think a workaround with lower impact would be to manually restore
NUL at the end of the buffer.


  reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25 22:56 Jeremy Linton
2018-05-26  5:51 ` Duy Nguyen
2018-05-26 23:57   ` Junio C Hamano
2018-05-27  5:02     ` Duy Nguyen
2018-05-27 11:53       ` Junio C Hamano
2018-05-28  2:41         ` Junio C Hamano [this message]
2018-06-13  1:04           ` Jeremy Linton
2018-05-25 23:17 Jeremy Linton
2018-05-25 23:36 ` Eric Sunshine
2018-05-26  1:06 ` Todd Zullinger

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk1rolcxg.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=lintonrjeremy@gmail.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox