From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <gitster@pobox.com>,
Rohit Ashiwal via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Subject: Re: [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die()
Date: Thu, 2 May 2019 22:29:55 +0200 [thread overview]
Message-ID: <be339f04-33e0-ede1-dbc2-340d7fb6694f@web.de> (raw)
In-Reply-To: <20190501180936.GB4109@sigill.intra.peff.net>
Am 01.05.19 um 20:09 schrieb Jeff King:
> On Mon, Apr 29, 2019 at 05:32:50PM -0400, Johannes Schindelin wrote:
>
>>> Another is that I am not sure how your "fixed format" argument
>>> meshes with the "-b blocksize" parameter to affect the tar/pax
>>> output. The format may be fixed, but it is parameterized. If
>>> we ever need to grow the ability to take "-b", having the knowledge
>>> that our current code is limited to the fixed BLOCKSIZE in a single
>>> function (i.e. the caller of this function , not the callee) would
>>> be less error prone.
>>
>> This argument would hold a lot more water if the following lines were not
>> part of archive-tar.c:
>>
>> #define RECORDSIZE (512)
>> #define BLOCKSIZE (RECORDSIZE * 20)
>>
>> static char block[BLOCKSIZE];
>>
>> If you can tell me how the `-b` (run-time) parameter can affect the
>> (compile-time) `BLOCKSIZE` constant, maybe I can start to understand your
>> concern.
>
> FWIW, I agree with you here. These patches are not making anything worse
> (and may even make them better, since we'd probably need to swap out the
> BLOCKSIZE constant for a run-time "blocksize" variable in fewer places).
The block size is mostly relevant for writing tar archives to magnetic
tapes. You can do that with git archive and a tape drive that supports
the blocking factor 20, which is the default for GNU tar and thus should
be quite common. You may get higher performance with a higher blocking
factor, if supported.
But so far this didn't come up on the mailing list, and I'd be surprised
if people really wrote snapshots of git archives directly to tape. So
I'm not too worried about this define ever becoming a user-settable
option. Sealing the constant into a function a bit feels dirty, though.
Mixing code and data makes the code more brittle.
Another example of that is the hard-coded file descriptor in the same
function, by the way. It's a lot of busywork to undo in order to gain
the ability to write to some other fd, for the questionable convenience
of not having to pass that parameter along the call chain. My bad.
But anyway, I worry more about the fact that blocking is not needed when
gzip'ing; gzwrite can be fed pieces of any size, not just 20 KB chunks.
The tar writer just needs to round up the archive size to a multiple of
20 KB and pad with NUL bytes at the end, in order to produce the same
uncompressed output as non-compressing tar.
If we'd wanted to be tape-friendly, then we'd have to block the gzip'ed
output instead of the uncompressed tar file, but I'm not suggesting
doing that.
Note to self: I wonder if moving the blocking part out into an
asynchronous function could simplify the code.
René
next prev parent reply other threads:[~2019-05-02 20:30 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-12 23:04 [PATCH 0/2] Avoid spawning gzip in git archive Johannes Schindelin via GitGitGadget
2019-04-12 23:04 ` [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die() Rohit Ashiwal via GitGitGadget
2019-04-13 1:34 ` Jeff King
2019-04-13 5:51 ` Junio C Hamano
2019-04-14 4:36 ` Rohit Ashiwal
2019-04-26 14:29 ` Johannes Schindelin
2019-04-26 23:44 ` Junio C Hamano
2019-04-29 21:32 ` Johannes Schindelin
2019-05-01 18:09 ` Jeff King
2019-05-02 20:29 ` René Scharfe [this message]
2019-05-05 5:25 ` Junio C Hamano
2019-05-06 5:07 ` Jeff King
2019-04-14 4:34 ` Rohit Ashiwal
2019-04-14 10:33 ` Junio C Hamano
2019-04-26 14:28 ` Johannes Schindelin
2019-05-01 18:07 ` Jeff King
2019-04-12 23:04 ` [PATCH 2/2] archive: avoid spawning `gzip` Rohit Ashiwal via GitGitGadget
2019-04-13 1:51 ` Jeff King
2019-04-13 22:01 ` René Scharfe
2019-04-15 21:35 ` Jeff King
2019-04-26 14:51 ` Johannes Schindelin
2019-04-27 9:59 ` René Scharfe
2019-04-27 17:39 ` René Scharfe
2019-04-29 21:25 ` Johannes Schindelin
2019-05-01 17:45 ` René Scharfe
2019-05-01 18:18 ` Jeff King
2019-06-10 10:44 ` René Scharfe
2019-06-13 19:16 ` Jeff King
2019-04-13 22:16 ` brian m. carlson
2019-04-15 21:36 ` Jeff King
2019-04-26 14:54 ` Johannes Schindelin
2019-05-02 20:20 ` Ævar Arnfjörð Bjarmason
2019-05-03 20:49 ` Johannes Schindelin
2019-05-03 20:52 ` Jeff King
2019-04-26 14:47 ` Johannes Schindelin
[not found] ` <pull.145.v2.git.gitgitgadget@gmail.com>
[not found] ` <4ea94a8784876c3a19e387537edd81a957fc692c.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:29 ` [PATCH v2 3/4] archive: optionally use zlib directly for gzip compression René Scharfe
[not found] ` <ac2b2488a1b42b3caf8a84594c48eca796748e59.1556321244.git.gitgitgadget@gmail.com>
2019-05-02 20:30 ` [PATCH v2 2/4] archive-tar: mark RECORDSIZE/BLOCKSIZE as unsigned René Scharfe
2019-05-08 11:45 ` Johannes Schindelin
2019-05-08 23:04 ` Jeff King
2019-05-09 14:06 ` Johannes Schindelin
2019-05-09 18:38 ` Jeff King
2019-05-10 17:18 ` René Scharfe
2019-05-10 21:20 ` Jeff King
2022-06-12 6:00 ` [PATCH v3 0/5] Avoid spawning gzip in git archive René Scharfe
2022-06-12 6:03 ` [PATCH v3 1/5] archive: rename archiver data field to filter_command René Scharfe
2022-06-12 6:05 ` [PATCH v3 2/5] archive-tar: factor out write_block() René Scharfe
2022-06-12 6:08 ` [PATCH v3 3/5] archive-tar: add internal gzip implementation René Scharfe
2022-06-13 19:10 ` Junio C Hamano
2022-06-12 6:18 ` [PATCH v3 4/5] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-12 6:19 ` [PATCH v3 5/5] archive-tar: use internal gzip by default René Scharfe
2022-06-13 21:55 ` Junio C Hamano
2022-06-14 11:27 ` Johannes Schindelin
2022-06-14 15:47 ` René Scharfe
2022-06-14 15:56 ` René Scharfe
2022-06-14 16:29 ` Johannes Schindelin
2022-06-14 20:04 ` René Scharfe
2022-06-15 16:41 ` Junio C Hamano
2022-06-14 11:28 ` [PATCH v3 0/5] Avoid spawning gzip in git archive Johannes Schindelin
2022-06-14 20:05 ` René Scharfe
2022-06-30 18:55 ` Johannes Schindelin
2022-07-01 16:05 ` Johannes Schindelin
2022-07-01 16:27 ` Jeff King
2022-07-01 17:47 ` Junio C Hamano
2022-06-15 16:53 ` [PATCH v4 0/6] " René Scharfe
2022-06-15 16:58 ` [PATCH v4 1/6] archive: update format documentation René Scharfe
2022-06-15 16:59 ` [PATCH v4 2/6] archive: rename archiver data field to filter_command René Scharfe
2022-06-15 17:01 ` [PATCH v4 3/6] archive-tar: factor out write_block() René Scharfe
2022-06-15 17:02 ` [PATCH v4 4/6] archive-tar: add internal gzip implementation René Scharfe
2022-06-15 20:32 ` Ævar Arnfjörð Bjarmason
2022-06-16 18:55 ` René Scharfe
2022-06-24 11:13 ` Ævar Arnfjörð Bjarmason
2022-06-24 20:24 ` René Scharfe
2022-06-15 17:04 ` [PATCH v4 5/6] archive-tar: use OS_CODE 3 (Unix) for internal gzip René Scharfe
2022-06-15 17:05 ` [PATCH v4 6/6] archive-tar: use internal gzip by default René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be339f04-33e0-ede1-dbc2-340d7fb6694f@web.de \
--to=l.s.r@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=rohit.ashiwal265@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).