git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* How to speedup git clone for big binary files (disable delta compression)
@ 2018-07-18 22:05 René Scheibe
  2018-07-19  5:33 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: René Scheibe @ 2018-07-18 22:05 UTC (permalink / raw)
  To: git

Hi,

I was wondering why "git clone" seems to not respect "-delta" in .gitattributes.


*Reproduction*

I prepared a test repository with:

- git v2.17.1
- .gitattributes containing "*.bin binary -delta"
- 10 commits with a 10 MB random binary file

Code:
---------------------------------------------------------------------
#!/bin/bash

# setup repository
git init --quiet repo
cd repo

echo '*.bin binary -delta' > .gitattributes
git add .gitattributes
git commit --quiet -m 'attributes'

for i in $(seq 10); do
    dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
    git add data.bin
    git commit --quiet -m "data $i"
done
cd ..

# create clone repository
time git clone --no-local repo clone

# repack original repository
cd repo
time git repack -a -d
---------------------------------------------------------------------

Output:
---------------------------------------------------------------------
Cloning into 'clone'...
remote: Counting objects: 33, done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 33 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (33/33), 95.40 MiB | 19.94 MiB/s, done.

real    0m25,085s
user    0m22,749s
sys     0m0,948s

Counting objects: 33, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (21/21), done.
Writing objects: 100% (33/33), done.
Total 33 (delta 0), reused 0 (delta 0)

real    0m5,652s
user    0m4,173s
sys     0m0,178s
---------------------------------------------------------------------


*Observations*

_time_

- Cloning: "clone" takes always 25s
- Optimizing: "repack" takes 25s with and 5s without delta compression

_compressed objects_

- Cloning: "clone" compresses always 31 objects
- Optimizing: "repack" compresses 31 objects with and 21 objects without delta compression


*Expectations*

Both operations ("repack" and "clone") are using "pack-objects".

Therefore my expectation is that "clone" should respect "-delta" and be about as fast as "repack".


Cheers,
  René

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: How to speedup git clone for big binary files (disable delta compression)
  2018-07-18 22:05 How to speedup git clone for big binary files (disable delta compression) René Scheibe
@ 2018-07-19  5:33 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2018-07-19  5:33 UTC (permalink / raw)
  To: René Scheibe; +Cc: git

On Thu, Jul 19, 2018 at 12:05:00AM +0200, René Scheibe wrote:

> Code:
> ---------------------------------------------------------------------
> #!/bin/bash
> 
> # setup repository
> git init --quiet repo
> cd repo
> 
> echo '*.bin binary -delta' > .gitattributes
> git add .gitattributes
> git commit --quiet -m 'attributes'
> 
> for i in $(seq 10); do
>     dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
>     git add data.bin
>     git commit --quiet -m "data $i"
> done
> cd ..
> 
> # create clone repository
> time git clone --no-local repo clone

This clone won't respect those attributes, because we don't dig into
in-repo attributes. There's actually some inconsistency in how Git
handles attribute locations. Usually they're just read from the top of
the working tree, but in some instances we read them from the tree
itself (e.g., git-archive respects some attributes from the tree it's
archiving).

If you do:

  echo "*.bin binary -delta" >repo/.git/info/attributes

then that does work (we always respect repo-level attributes like that).

> # repack original repository
> cd repo
> time git repack -a -d

In this case we're reading the attributes from the working tree, and it
does work. In theory the clone case could do so, too, but git-upload-pack,
the server side of the clone, avoids looking at the working tree at all.
That's something we _could_ address, but it doesn't really fix the
general case, since most clones will be from a bare repository anyway.

So in summary:

  1. Depending on what you're trying to do, the .git/info/attributes
     trick might be enough for you.

  2. I do think it would be nice for more places to respect attributes
     from in trees. There's a question of which tree, but I think in
     general reading them from HEAD in a bare repository would do what
     people want (it's a little funny if you're fetching branch "foo",
     but HEAD points to "bar", but it's at least consistent with the
     non-bare case). There's some prior art in the way we treat mailmaps
     (in a bare repo, we read HEAD:.mailmap).

     I suspect the patch may not be trivial, as I don't know how ready
     the attributes code is to handle in-tree lookups (remember that it
     is not just HEAD:.gitattributes we must care about, but other files
     sprinkled through the repository, like "HEAD:subdir/.gitattributes".

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-07-19  5:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18 22:05 How to speedup git clone for big binary files (disable delta compression) René Scheibe
2018-07-19  5:33 ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).