git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: "René Scharfe" <l.s.r@web.de>,
	git@vger.kernel.org, "Rohit Ashiwal" <rohit.ashiwal265@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"brian m . carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH v3 5/5] archive-tar: use internal gzip by default
Date: Tue, 14 Jun 2022 13:27:37 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2206141109270.353@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <xmqqk09k449y.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 4352 bytes --]

Hi Junio,

On Mon, 13 Jun 2022, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
> > -test_expect_success GZIP 'git archive --format=tar.gz' '
> > +test_expect_success 'git archive --format=tar.gz' '
> >  	git archive --format=tar.gz HEAD >j1.tar.gz &&
> >  	test_cmp_bin j.tgz j1.tar.gz
> >  '
>
> Curiously, this breaks for me.  It is understandable if we are not
> producing byte-for-byte identical output with internal gzip.

Indeed, I can reproduce this, too. In particular, `j.tgz` and `j1.tar.gz`
differ like this in my test run:

-00000000  1f 8b 08 1a 00 2e ca 09  00 03 04 00 89 45 fc 83 |.............E..|
+00000000  1f 8b 08 1a 00 35 2a 10  00 03 04 00 89 45 fc 83 |.....5*......E..|

and

-00000010  7d fc 00 f1 d0 ec b7 63  8c 30 cc 9b e6 db b6 6d |}......c.0.....m|
+00000010  7d fc 00 54 ff ec b7 63  8c 30 cc 9b e6 db b6 6d |}..T...c.0.....m|

According to https://datatracker.ietf.org/doc/html/rfc1952#page-5, the
difference in the first line is the mtime. For reference, this is the
version with `git -c tar.tgz.command="gzip -cn" archive --format=tgz
HEAD`:

00000000  1f 8b 08 00 00 00 00 00  00 03 ec b7 63 8c 30 cc |............c.0.|

In other words, `gzip` forces the `mtim` member to all zeros, which makes
sense.

The recorded mtimes are a bit funny, according to
https://wolf-tungsten.github.io/gzip-analyzer/, they are 1975-03-17
00:36:32 and 1978-08-05 22:45:36, respectively...

And the mtime actually changes all the time.

What's even more funny: if I comment out the `deflateSetHeader()`, the
mtime header field is left at all-zeros. This is on Ubuntu 18.04 with
zlib1g 1:1.2.11.dfsg-0ubuntu2.

So I dug in a bit deeper and what do you know, the `deflateHeader()`
function is implemented like this
(https://github.com/madler/zlib/blob/21767c654d31/deflate.c#L557-L565):

	int ZEXPORT deflateSetHeader (strm, head)
	    z_streamp strm;
	    gz_headerp head;
	{
	    if (deflateStateCheck(strm) || strm->state->wrap != 2)
		return Z_STREAM_ERROR;
	    strm->state->gzhead = head;
	    return Z_OK;
	}

Now, the caller is implemented like this:

	static void tgz_set_os(git_zstream *strm, int os)
	{
	#if ZLIB_VERNUM >= 0x1221
		struct gz_header_s gzhead = { .os = os };
		deflateSetHeader(&strm->z, &gzhead);
	#endif
	}

The biggest problem is not that the return value of `deflateSetHeader()`
is ignored. The biggest problem is that it passes the address of a heap
variable to the `deflateSetHeader()` function, which then stores it away
in another struct that lives beyond the point when we return from
`tgz_set_os()`.

In other words, this is the very issue I pointed out as GCC not catching:
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2205272235220.349@tvgsbejvaqbjf.bet/

The solution is to move the heap variable back into a scope that matches
the lifetime of the compression:

-- snip --
diff --git a/archive-tar.c b/archive-tar.c
index 60669eb7b9c..3d77e0f7509 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -460,17 +460,12 @@ static void tgz_write_block(const void *data)

 static const char internal_gzip_command[] = "git archive gzip";

-static void tgz_set_os(git_zstream *strm, int os)
-{
-#if ZLIB_VERNUM >= 0x1221
-	struct gz_header_s gzhead = { .os = os };
-	deflateSetHeader(&strm->z, &gzhead);
-#endif
-}
-
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args)
 {
+#if ZLIB_VERNUM >= 0x1221
+	struct gz_header_s gzhead = { .os = 3 }; /* Unix, for reproducibility */
+#endif
 	struct strbuf cmd = STRBUF_INIT;
 	struct child_process filter = CHILD_PROCESS_INIT;
 	int r;
@@ -481,7 +476,10 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	if (!strcmp(ar->filter_command, internal_gzip_command)) {
 		write_block = tgz_write_block;
 		git_deflate_init_gzip(&gzstream, args->compression_level);
-		tgz_set_os(&gzstream, 3); /* Unix, for reproducibility */
+#if ZLIB_VERNUM >= 0x1221
+		if (deflateSetHeader(&gzstream.z, &gzhead) != Z_OK)
+			BUG("deflateSetHeader() called too late");
+#endif
 		gzstream.next_out = outbuf;
 		gzstream.avail_out = sizeof(outbuf);

-- snap --

With this, the test passes for me.

René, would you mind squashing this into your patch series?

Thank you,
Dscho

  reply	other threads:[~2022-06-14 11:28 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13  1:34   ` Jeff King
2019-04-13  5:51     ` Junio C Hamano
2019-04-14  4:36       ` Rohit Ashiwal
2019-04-26 14:29       ` Johannes Schindelin
2019-04-26 23:44         ` Junio C Hamano
2019-04-29 21:32           ` Johannes Schindelin
2019-05-01 18:09             ` Jeff King
2019-05-02 20:29               ` René Scharfe
2019-05-05  5:25               ` Junio C Hamano
2019-05-06  5:07                 ` Jeff King
2019-04-14  4:34     ` Rohit Ashiwal
2019-04-14 10:33       ` Junio C Hamano
2019-04-26 14:28     ` Johannes Schindelin
2019-05-01 18:07       ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13  1:51   ` Jeff King
2019-04-13 22:01     ` René Scharfe
2019-04-15 21:35       ` Jeff King
2019-04-26 14:51         ` Johannes Schindelin
2019-04-27  9:59           ` René Scharfe
2019-04-27 17:39             ` René Scharfe
2019-04-29 21:25               ` Johannes Schindelin
2019-05-01 17:45                 ` René Scharfe
2019-05-01 18:18                   ` Jeff King
2019-06-10 10:44                     ` René Scharfe
2019-06-13 19:16                       ` Jeff King
2019-04-13 22:16     ` brian m. carlson
2019-04-15 21:36       ` Jeff King
2019-04-26 14:54       ` Johannes Schindelin
2019-05-02 20:20         ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49           ` Johannes Schindelin
2019-05-03 20:52             ` Jeff King
2019-04-26 14:47     ` Johannes Schindelin
     [not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
     [not found]   ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29     ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
     [not found]   ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30     ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45       ` Johannes Schindelin
2019-05-08 23:04         ` Jeff King
2019-05-09 14:06           ` Johannes Schindelin
2019-05-09 18:38             ` Jeff King
2019-05-10 17:18               ` René Scharfe
2019-05-10 21:20                 ` Jeff King
2022-06-12  6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12  6:03   ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12  6:05   ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12  6:08   ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10     ` Junio C Hamano
2022-06-12  6:18   ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12  6:19   ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55     ` Junio C Hamano
2022-06-14 11:27       ` Johannes Schindelin [this message]
2022-06-14 15:47         ` René Scharfe
2022-06-14 15:56           ` René Scharfe
2022-06-14 16:29           ` Johannes Schindelin
2022-06-14 20:04             ` René Scharfe
2022-06-15 16:41               ` Junio C Hamano
2022-06-14 11:28   ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05     ` René Scharfe
2022-06-30 18:55       ` Johannes Schindelin
2022-07-01 16:05         ` Johannes Schindelin
2022-07-01 16:27           ` Jeff King
2022-07-01 17:47             ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58   ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59   ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01   ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02   ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32     ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55       ` René Scharfe
2022-06-24 11:13         ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24           ` René Scharfe
2022-06-15 17:04   ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05   ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2206141109270.353@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=rohit.ashiwal265@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).