git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Rohit Ashiwal" <rohit.ashiwal265@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"brian m . carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH v3 5/5] archive-tar: use internal gzip by default
Date: Tue, 14 Jun 2022 17:47:40 +0200	[thread overview]
Message-ID: <28f6ec2a-1d94-b29a-4bfd-6a9e74c8edbf@web.de> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2206141109270.353@tvgsbejvaqbjf.bet>

Am 14.06.22 um 13:27 schrieb Johannes Schindelin:
> Hi Junio,
>
> On Mon, 13 Jun 2022, Junio C Hamano wrote:
>
>> René Scharfe <l.s.r@web.de> writes:
>>
>>> -test_expect_success GZIP 'git archive --format=tar.gz' '
>>> +test_expect_success 'git archive --format=tar.gz' '
>>>  	git archive --format=tar.gz HEAD >j1.tar.gz &&
>>>  	test_cmp_bin j.tgz j1.tar.gz
>>>  '
>>
>> Curiously, this breaks for me.  It is understandable if we are not
>> producing byte-for-byte identical output with internal gzip.

Makes sense in retrospect, there's no reason the output of gzip(1) and
zlib would have to be the same exactly.  It just happened to be so on my
platform, so the tests deceptively passed for me.  I think we simply
have to drop those that try to compare compressed files made by
different tools -- we can still check if their content can be extracted
and matches.

> Indeed, I can reproduce this, too. In particular, `j.tgz` and `j1.tar.gz`
> differ like this in my test run:
>
> -00000000  1f 8b 08 1a 00 2e ca 09  00 03 04 00 89 45 fc 83 |.............E..|
> +00000000  1f 8b 08 1a 00 35 2a 10  00 03 04 00 89 45 fc 83 |.....5*......E..|
>
> and
>
> -00000010  7d fc 00 f1 d0 ec b7 63  8c 30 cc 9b e6 db b6 6d |}......c.0.....m|
> +00000010  7d fc 00 54 ff ec b7 63  8c 30 cc 9b e6 db b6 6d |}..T...c.0.....m|
>
> According to https://datatracker.ietf.org/doc/html/rfc1952#page-5, the
> difference in the first line is the mtime. For reference, this is the
> version with `git -c tar.tgz.command="gzip -cn" archive --format=tgz
> HEAD`:
>
> 00000000  1f 8b 08 00 00 00 00 00  00 03 ec b7 63 8c 30 cc |............c.0.|
>
> In other words, `gzip` forces the `mtim` member to all zeros, which makes
> sense.

And that's what zlib does as well for me:

   $ ./git-archive --format=tgz HEAD | hexdump -C | head -1
   00000000  1f 8b 08 00 00 00 00 00  00 03 ec bd 59 73 1c 49  |..........?Ys.I|

>
> The recorded mtimes are a bit funny, according to
> https://wolf-tungsten.github.io/gzip-analyzer/, they are 1975-03-17
> 00:36:32 and 1978-08-05 22:45:36, respectively...
>
> And the mtime actually changes all the time.
>
> What's even more funny: if I comment out the `deflateSetHeader()`, the
> mtime header field is left at all-zeros. This is on Ubuntu 18.04 with
> zlib1g 1:1.2.11.dfsg-0ubuntu2.
>
> So I dug in a bit deeper and what do you know, the `deflateHeader()`
> function is implemented like this
> (https://github.com/madler/zlib/blob/21767c654d31/deflate.c#L557-L565):
>
> 	int ZEXPORT deflateSetHeader (strm, head)
> 	    z_streamp strm;
> 	    gz_headerp head;
> 	{
> 	    if (deflateStateCheck(strm) || strm->state->wrap != 2)
> 		return Z_STREAM_ERROR;
> 	    strm->state->gzhead = head;
> 	    return Z_OK;
> 	}
>
> Now, the caller is implemented like this:
>
> 	static void tgz_set_os(git_zstream *strm, int os)
> 	{
> 	#if ZLIB_VERNUM >= 0x1221
> 		struct gz_header_s gzhead = { .os = os };
> 		deflateSetHeader(&strm->z, &gzhead);
> 	#endif
> 	}
>
> The biggest problem is not that the return value of `deflateSetHeader()`
> is ignored. The biggest problem is that it passes the address of a heap
> variable to the `deflateSetHeader()` function, which then stores it away
> in another struct that lives beyond the point when we return from
> `tgz_set_os()`.

Ah, you mean the address of an automatic variable on the stack, but I
get it.  D'oh!

>
> In other words, this is the very issue I pointed out as GCC not catching:
> https://lore.kernel.org/git/nycvar.QRO.7.76.6.2205272235220.349@tvgsbejvaqbjf.bet/
>
> The solution is to move the heap variable back into a scope that matches
> the lifetime of the compression:
>
> -- snip --
> diff --git a/archive-tar.c b/archive-tar.c
> index 60669eb7b9c..3d77e0f7509 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -460,17 +460,12 @@ static void tgz_write_block(const void *data)
>
>  static const char internal_gzip_command[] = "git archive gzip";
>
> -static void tgz_set_os(git_zstream *strm, int os)
> -{
> -#if ZLIB_VERNUM >= 0x1221
> -	struct gz_header_s gzhead = { .os = os };
> -	deflateSetHeader(&strm->z, &gzhead);
> -#endif
> -}
> -
>  static int write_tar_filter_archive(const struct archiver *ar,
>  				    struct archiver_args *args)
>  {
> +#if ZLIB_VERNUM >= 0x1221
> +	struct gz_header_s gzhead = { .os = 3 }; /* Unix, for reproducibility */
> +#endif
>  	struct strbuf cmd = STRBUF_INIT;
>  	struct child_process filter = CHILD_PROCESS_INIT;
>  	int r;
> @@ -481,7 +476,10 @@ static int write_tar_filter_archive(const struct archiver *ar,
>  	if (!strcmp(ar->filter_command, internal_gzip_command)) {
>  		write_block = tgz_write_block;
>  		git_deflate_init_gzip(&gzstream, args->compression_level);
> -		tgz_set_os(&gzstream, 3); /* Unix, for reproducibility */
> +#if ZLIB_VERNUM >= 0x1221
> +		if (deflateSetHeader(&gzstream.z, &gzhead) != Z_OK)
> +			BUG("deflateSetHeader() called too late");
> +#endif
>  		gzstream.next_out = outbuf;
>  		gzstream.avail_out = sizeof(outbuf);
>
> -- snap --

Good find, thank you!  A shorter solution would be to make gzhead static.

>
> With this, the test passes for me.
>
> René, would you mind squashing this into your patch series?
>
> Thank you,
> Dscho

  reply	other threads:[~2022-06-14 15:48 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13  1:34   ` Jeff King
2019-04-13  5:51     ` Junio C Hamano
2019-04-14  4:36       ` Rohit Ashiwal
2019-04-26 14:29       ` Johannes Schindelin
2019-04-26 23:44         ` Junio C Hamano
2019-04-29 21:32           ` Johannes Schindelin
2019-05-01 18:09             ` Jeff King
2019-05-02 20:29               ` René Scharfe
2019-05-05  5:25               ` Junio C Hamano
2019-05-06  5:07                 ` Jeff King
2019-04-14  4:34     ` Rohit Ashiwal
2019-04-14 10:33       ` Junio C Hamano
2019-04-26 14:28     ` Johannes Schindelin
2019-05-01 18:07       ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13  1:51   ` Jeff King
2019-04-13 22:01     ` René Scharfe
2019-04-15 21:35       ` Jeff King
2019-04-26 14:51         ` Johannes Schindelin
2019-04-27  9:59           ` René Scharfe
2019-04-27 17:39             ` René Scharfe
2019-04-29 21:25               ` Johannes Schindelin
2019-05-01 17:45                 ` René Scharfe
2019-05-01 18:18                   ` Jeff King
2019-06-10 10:44                     ` René Scharfe
2019-06-13 19:16                       ` Jeff King
2019-04-13 22:16     ` brian m. carlson
2019-04-15 21:36       ` Jeff King
2019-04-26 14:54       ` Johannes Schindelin
2019-05-02 20:20         ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49           ` Johannes Schindelin
2019-05-03 20:52             ` Jeff King
2019-04-26 14:47     ` Johannes Schindelin
     [not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
     [not found]   ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29     ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
     [not found]   ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30     ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45       ` Johannes Schindelin
2019-05-08 23:04         ` Jeff King
2019-05-09 14:06           ` Johannes Schindelin
2019-05-09 18:38             ` Jeff King
2019-05-10 17:18               ` René Scharfe
2019-05-10 21:20                 ` Jeff King
2022-06-12  6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12  6:03   ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12  6:05   ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12  6:08   ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10     ` Junio C Hamano
2022-06-12  6:18   ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12  6:19   ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55     ` Junio C Hamano
2022-06-14 11:27       ` Johannes Schindelin
2022-06-14 15:47         ` René Scharfe [this message]
2022-06-14 15:56           ` René Scharfe
2022-06-14 16:29           ` Johannes Schindelin
2022-06-14 20:04             ` René Scharfe
2022-06-15 16:41               ` Junio C Hamano
2022-06-14 11:28   ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05     ` René Scharfe
2022-06-30 18:55       ` Johannes Schindelin
2022-07-01 16:05         ` Johannes Schindelin
2022-07-01 16:27           ` Jeff King
2022-07-01 17:47             ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58   ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59   ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01   ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02   ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32     ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55       ` René Scharfe
2022-06-24 11:13         ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24           ` René Scharfe
2022-06-15 17:04   ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05   ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28f6ec2a-1d94-b29a-4bfd-6a9e74c8edbf@web.de \
    --to=l.s.r@web.de \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=rohit.ashiwal265@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).