From: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To: Junio C Hamano <gitster@pobox.com>,
"brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: RE: git --archive
Date: Fri, 23 Sep 2022 16:51:27 +0000 [thread overview]
Message-ID: <PH0PR06MB7639A83F08BF3B286E5FEB3886519@PH0PR06MB7639.namprd06.prod.outlook.com> (raw)
In-Reply-To: <xmqqedw2vysc.fsf@gitster.g>
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> Maybe they can technically be stored in any order, but people don't
>> want git archive to produce non-deterministic archives...
>> ... I feel like it would be very difficult to achieve the speedups
>> you want and still produce a deterministic archive.
>
> I am not going to work on it myself, but I think the only possible parallelism would come from making the reading for F(n+1) and subsequent objects overlap writing of F(n), given a deterministic order of files in the resulting archive. When we decide which file should come first, and learns that it is F(0), it probably comes the tree object of the root level, and it is very likely that we would already know what F(1) and F(2) are by that time, so it should be possible to dispatch reading and applying content filtering on F(1) and keeping the result in core, while we are still writing F(0) out.
>
> Thanks.
Yes. But even preceeding any changes in the actual tree traversal to collect the objects one-by-one as currently, a "simple" parallelized, recursive walk over all objects, pseudo-randomly reading a fraction of the data (mostly directories, but also files to update all the (externally) cached inode metadata, should help. As long as this stage is highly parallelizes, it's cost (in time) would be recovered in a much faster single-threaded tree recursion just as exists currently.
That is not to say, that the above method wouldn't be a significant improvement again 😊
next prev parent reply other threads:[~2022-09-23 16:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-22 8:57 git --archive Scheffenegger, Richard
2022-09-22 20:13 ` Junio C Hamano
2022-09-22 20:35 ` Scheffenegger, Richard
2022-09-23 0:49 ` brian m. carlson
2022-09-23 16:30 ` Junio C Hamano
2022-09-23 16:51 ` Scheffenegger, Richard [this message]
2022-09-24 8:58 ` René Scharfe
2022-09-24 11:34 ` Scheffenegger, Richard
2022-09-24 13:19 ` René Scharfe
2022-09-24 18:07 ` René Scharfe
2022-09-25 8:17 ` René Scharfe
2022-09-24 19:44 ` Scheffenegger, Richard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PH0PR06MB7639A83F08BF3B286E5FEB3886519@PH0PR06MB7639.namprd06.prod.outlook.com \
--to=richard.scheffenegger@netapp.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).