mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <>
To: "Scheffenegger, Richard" <>
Cc: Junio C Hamano <>,
	"" <>
Subject: Re: git --archive
Date: Fri, 23 Sep 2022 00:49:08 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

[-- Attachment #1: Type: text/plain, Size: 1909 bytes --]

On 2022-09-22 at 20:35:08, Scheffenegger, Richard wrote:
> Also, at least for ZIP (not so much for TAR), objects residing in
> different subdirectories can be stored in any order - and only need to
> be referenced properly in the central directory. Thus whenever a
> subthread has completed the reading of a (sufficiently small) object
> to be in (git program) memory, it should be sent immediately to the
> ZIP writer thread. The result would be that small and hot files (which
> can be read in quickly) end up at the beginning of the zip file, but
> the parallel threads can already, in parallel, read-in larger and
> colder object - the absolute wait time within the worker thread
> reading those objects may be slightly higher, but as many objects are
> read in in parallel, the absolute time to create the archive would be
> minimized.

Maybe they can technically be stored in any order, but people don't want
git archive to produce non-deterministic archives.  I'm one of the folks
responsible for the service at GitHub that serves archives (which uses
git archive under the hood) and people become very unhappy when the
archives are not bit-for-bit identical, even though neither Git nor
GitHub guarantee that.  That's because people want to use those archives
with cryptographic hashes like SHA-256, and if the file changes, the
hash breaks.  (We tell them to generate a tarball as part of the release
process and upload it as a release asset instead.)

What Git does implicitly guarantee is that the result is deterministic:
that is, given the same repository and the same version of Git, that the
archive is identical.  The encoding may change across versions, but not
within a version.  I feel like it would be very difficult to achieve the
speedups you want and still produce a deterministic archive.
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

  reply	other threads:[~2022-09-23  0:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22  8:57 git --archive Scheffenegger, Richard
2022-09-22 20:13 ` Junio C Hamano
2022-09-22 20:35   ` Scheffenegger, Richard
2022-09-23  0:49     ` brian m. carlson [this message]
2022-09-23 16:30       ` Junio C Hamano
2022-09-23 16:51         ` Scheffenegger, Richard
2022-09-24  8:58         ` René Scharfe
2022-09-24 11:34           ` Scheffenegger, Richard
2022-09-24 13:19             ` René Scharfe
2022-09-24 18:07               ` René Scharfe
2022-09-25  8:17                 ` René Scharfe
2022-09-24 19:44               ` Scheffenegger, Richard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).