Hi Junio, On Mon, 13 Jun 2022, Junio C Hamano wrote: > René Scharfe writes: > > > -test_expect_success GZIP 'git archive --format=tar.gz' ' > > +test_expect_success 'git archive --format=tar.gz' ' > > git archive --format=tar.gz HEAD >j1.tar.gz && > > test_cmp_bin j.tgz j1.tar.gz > > ' > > Curiously, this breaks for me. It is understandable if we are not > producing byte-for-byte identical output with internal gzip. Indeed, I can reproduce this, too. In particular, `j.tgz` and `j1.tar.gz` differ like this in my test run: -00000000 1f 8b 08 1a 00 2e ca 09 00 03 04 00 89 45 fc 83 |.............E..| +00000000 1f 8b 08 1a 00 35 2a 10 00 03 04 00 89 45 fc 83 |.....5*......E..| and -00000010 7d fc 00 f1 d0 ec b7 63 8c 30 cc 9b e6 db b6 6d |}......c.0.....m| +00000010 7d fc 00 54 ff ec b7 63 8c 30 cc 9b e6 db b6 6d |}..T...c.0.....m| According to https://datatracker.ietf.org/doc/html/rfc1952#page-5, the difference in the first line is the mtime. For reference, this is the version with `git -c tar.tgz.command="gzip -cn" archive --format=tgz HEAD`: 00000000 1f 8b 08 00 00 00 00 00 00 03 ec b7 63 8c 30 cc |............c.0.| In other words, `gzip` forces the `mtim` member to all zeros, which makes sense. The recorded mtimes are a bit funny, according to https://wolf-tungsten.github.io/gzip-analyzer/, they are 1975-03-17 00:36:32 and 1978-08-05 22:45:36, respectively... And the mtime actually changes all the time. What's even more funny: if I comment out the `deflateSetHeader()`, the mtime header field is left at all-zeros. This is on Ubuntu 18.04 with zlib1g 1:1.2.11.dfsg-0ubuntu2. So I dug in a bit deeper and what do you know, the `deflateHeader()` function is implemented like this (https://github.com/madler/zlib/blob/21767c654d31/deflate.c#L557-L565): int ZEXPORT deflateSetHeader (strm, head) z_streamp strm; gz_headerp head; { if (deflateStateCheck(strm) || strm->state->wrap != 2) return Z_STREAM_ERROR; strm->state->gzhead = head; return Z_OK; } Now, the caller is implemented like this: static void tgz_set_os(git_zstream *strm, int os) { #if ZLIB_VERNUM >= 0x1221 struct gz_header_s gzhead = { .os = os }; deflateSetHeader(&strm->z, &gzhead); #endif } The biggest problem is not that the return value of `deflateSetHeader()` is ignored. The biggest problem is that it passes the address of a heap variable to the `deflateSetHeader()` function, which then stores it away in another struct that lives beyond the point when we return from `tgz_set_os()`. In other words, this is the very issue I pointed out as GCC not catching: https://lore.kernel.org/git/nycvar.QRO.7.76.6.2205272235220.349@tvgsbejvaqbjf.bet/ The solution is to move the heap variable back into a scope that matches the lifetime of the compression: -- snip -- diff --git a/archive-tar.c b/archive-tar.c index 60669eb7b9c..3d77e0f7509 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -460,17 +460,12 @@ static void tgz_write_block(const void *data) static const char internal_gzip_command[] = "git archive gzip"; -static void tgz_set_os(git_zstream *strm, int os) -{ -#if ZLIB_VERNUM >= 0x1221 - struct gz_header_s gzhead = { .os = os }; - deflateSetHeader(&strm->z, &gzhead); -#endif -} - static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args) { +#if ZLIB_VERNUM >= 0x1221 + struct gz_header_s gzhead = { .os = 3 }; /* Unix, for reproducibility */ +#endif struct strbuf cmd = STRBUF_INIT; struct child_process filter = CHILD_PROCESS_INIT; int r; @@ -481,7 +476,10 @@ static int write_tar_filter_archive(const struct archiver *ar, if (!strcmp(ar->filter_command, internal_gzip_command)) { write_block = tgz_write_block; git_deflate_init_gzip(&gzstream, args->compression_level); - tgz_set_os(&gzstream, 3); /* Unix, for reproducibility */ +#if ZLIB_VERNUM >= 0x1221 + if (deflateSetHeader(&gzstream.z, &gzhead) != Z_OK) + BUG("deflateSetHeader() called too late"); +#endif gzstream.next_out = outbuf; gzstream.avail_out = sizeof(outbuf); -- snap -- With this, the test passes for me. René, would you mind squashing this into your patch series? Thank you, Dscho