* How to speedup git clone for big binary files (disable delta compression)
@ 2018-07-18 22:05 René Scheibe
2018-07-19 5:33 ` Jeff King
0 siblings, 1 reply; 2+ messages in thread
From: René Scheibe @ 2018-07-18 22:05 UTC (permalink / raw)
To: git
Hi,
I was wondering why "git clone" seems to not respect "-delta" in .gitattributes.
*Reproduction*
I prepared a test repository with:
- git v2.17.1
- .gitattributes containing "*.bin binary -delta"
- 10 commits with a 10 MB random binary file
Code:
---------------------------------------------------------------------
#!/bin/bash
# setup repository
git init --quiet repo
cd repo
echo '*.bin binary -delta' > .gitattributes
git add .gitattributes
git commit --quiet -m 'attributes'
for i in $(seq 10); do
dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
git add data.bin
git commit --quiet -m "data $i"
done
cd ..
# create clone repository
time git clone --no-local repo clone
# repack original repository
cd repo
time git repack -a -d
---------------------------------------------------------------------
Output:
---------------------------------------------------------------------
Cloning into 'clone'...
remote: Counting objects: 33, done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 33 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (33/33), 95.40 MiB | 19.94 MiB/s, done.
real 0m25,085s
user 0m22,749s
sys 0m0,948s
Counting objects: 33, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (21/21), done.
Writing objects: 100% (33/33), done.
Total 33 (delta 0), reused 0 (delta 0)
real 0m5,652s
user 0m4,173s
sys 0m0,178s
---------------------------------------------------------------------
*Observations*
_time_
- Cloning: "clone" takes always 25s
- Optimizing: "repack" takes 25s with and 5s without delta compression
_compressed objects_
- Cloning: "clone" compresses always 31 objects
- Optimizing: "repack" compresses 31 objects with and 21 objects without delta compression
*Expectations*
Both operations ("repack" and "clone") are using "pack-objects".
Therefore my expectation is that "clone" should respect "-delta" and be about as fast as "repack".
Cheers,
René
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: How to speedup git clone for big binary files (disable delta compression)
2018-07-18 22:05 How to speedup git clone for big binary files (disable delta compression) René Scheibe
@ 2018-07-19 5:33 ` Jeff King
0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2018-07-19 5:33 UTC (permalink / raw)
To: René Scheibe; +Cc: git
On Thu, Jul 19, 2018 at 12:05:00AM +0200, René Scheibe wrote:
> Code:
> ---------------------------------------------------------------------
> #!/bin/bash
>
> # setup repository
> git init --quiet repo
> cd repo
>
> echo '*.bin binary -delta' > .gitattributes
> git add .gitattributes
> git commit --quiet -m 'attributes'
>
> for i in $(seq 10); do
> dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
> git add data.bin
> git commit --quiet -m "data $i"
> done
> cd ..
>
> # create clone repository
> time git clone --no-local repo clone
This clone won't respect those attributes, because we don't dig into
in-repo attributes. There's actually some inconsistency in how Git
handles attribute locations. Usually they're just read from the top of
the working tree, but in some instances we read them from the tree
itself (e.g., git-archive respects some attributes from the tree it's
archiving).
If you do:
echo "*.bin binary -delta" >repo/.git/info/attributes
then that does work (we always respect repo-level attributes like that).
> # repack original repository
> cd repo
> time git repack -a -d
In this case we're reading the attributes from the working tree, and it
does work. In theory the clone case could do so, too, but git-upload-pack,
the server side of the clone, avoids looking at the working tree at all.
That's something we _could_ address, but it doesn't really fix the
general case, since most clones will be from a bare repository anyway.
So in summary:
1. Depending on what you're trying to do, the .git/info/attributes
trick might be enough for you.
2. I do think it would be nice for more places to respect attributes
from in trees. There's a question of which tree, but I think in
general reading them from HEAD in a bare repository would do what
people want (it's a little funny if you're fetching branch "foo",
but HEAD points to "bar", but it's at least consistent with the
non-bare case). There's some prior art in the way we treat mailmaps
(in a bare repo, we read HEAD:.mailmap).
I suspect the patch may not be trivial, as I don't know how ready
the attributes code is to handle in-tree lookups (remember that it
is not just HEAD:.gitattributes we must care about, but other files
sprinkled through the repository, like "HEAD:subdir/.gitattributes".
-Peff
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-07-19 5:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18 22:05 How to speedup git clone for big binary files (disable delta compression) René Scheibe
2018-07-19 5:33 ` Jeff King
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).