git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Nicolas Pitre <nico@cam.org>
Cc: Tomasz Kontusz <roverorna@gmail.com>, git <git@vger.kernel.org>
Subject: Re: Continue git clone after interruption
Date: Wed, 19 Aug 2009 17:19:50 +0200	[thread overview]
Message-ID: <200908191719.52974.jnareb@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0908181711350.6044@xanadu.home>

On Tue, 18 Aug 2009, Nicolas Pitre wrote:
> On Tue, 18 Aug 2009, Jakub Narebski wrote:
> 
>> You can probably get number and size taken by delta and non-delta (base)
>> objects in the packfile somehow.  Neither "git verify-pack -v <packfile>"
>> nor contrib/stats/packinfo.pl did help me arrive at this data.
> 
> Documentation for verify-pack says:
> 
> |When specifying the -v option the format used is:
> |
> |        SHA1 type size size-in-pack-file offset-in-packfile
> |
> |for objects that are not deltified in the pack, and
> |
> |        SHA1 type size size-in-packfile offset-in-packfile depth base-SHA1
> |
> |for objects that are deltified.
> 
> So a simple script should be able to give you the answer.

Thanks.

There are 114937 objects in this packfile, including 56249 objects
used as base (can be deltified or not).  git-verify-pack -v shows
that all objects have total size-in-packfile of 33 MB (which agrees
with packfile size of 33 MB), with 17 MB size-in-packfile taken by
deltaified objects, and 16 MB taken by base objects.

  git verify-pack -v | 
    grep -v "^chain" | 
    grep -v "objects/pack/pack-" > verify-pack.out

  sum=0; bsum=0; dsum=0; 
  while read sha1 type size packsize off depth base; do
    echo "$sha1" >> verify-pack.sha1.out
    sum=$(( $sum + $packsize ))
    if [ -n "$base" ]; then 
       echo "$sha1" >> verify-pack.delta.out
       dsum=$(( $dsum + $packsize ))
    else
       echo "$sha1" >> verify-pack.base.out
       bsum=$(( $bsum + $packsize ))
    fi
  done < verify-pack.out
  echo "sum=$sum; bsum=$bsum; dsum=$dsum"
 
>>>> (BTW what happens if this pack is larger than file size limit for 
>>>> given filesystem?).
[...]

>> If I remember correctly FAT28^W FAT32 has maximum file size of 2 GB.
>> FAT is often used on SSD, on USB drive.  Although if you have  2 GB
>> packfile, you are doing something wrong, or UGFWIINI (Using Git For
>> What It Is Not Intended).
> 
> Hopefully you're not performing a 'git clone' off of a FAT filesystem.  
> For physical transport you may repack with the appropriate switches.

Not off a FAT filesystem, but into a FAT filesystem.
 
[...]

>>> I think it is better to "prime" the repository with the content of the 
>>> top commit in the most straight forward manner using git-archive which 
>>> has the potential to be fully restartable at any point with little 
>>> complexity on the server side.
>> 
>> But didn't it make fully restartable 2.5 MB part out of 37 MB packfile?
> 
> The front of the pack is the critical point.  If you get enough to 
> create the top commit then further transfers can be done incrementally 
> with only the deltas between each commits.

How?  You have some objects that can be used as base; how to tell 
git-daemon that we have them (but not theirs prerequisites), and how
to generate incrementals?

>> A question about pack protocol negotiation.  If clients presents some
>> objects as "have", server can and does assume that client has all 
>> prerequisites for such objects, e.g. for tree objects that it has
>> all objects for files and directories inside tree; for commit it means
>> all ancestors and all objects in snapshot (have top tree, and its 
>> prerequisites).  Do I understand this correctly?
> 
> That works only for commits.

Hmmmm... how do you intent for "prefetch top objects restartable-y first"
to work, then?
 
>> BTW. because of compression it might be more difficult to resume 
>> archive creation in the middle, I think...
> 
> Why so?  the tar+gzip format is streamable.

gzip format uses sliding window in compression.  "cat a b | gzip"
is different from "cat <(gzip a) <(gzip b)".

But that doesn't matter.  If we are interrupted in the middle, we can
uncompress what we have to check how far did we get, and tell server
to send the rest; this way server wouldn't have to even generate 
(but not send) what we get as partial transfer.

P.S. What do you think about 'bundle' capability extension mentioned
     in a side sub-thread?
-- 
Jakub Narebski
Poland

  reply	other threads:[~2009-08-19 15:20 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-17 11:42 Continue git clone after interruption Tomasz Kontusz
2009-08-17 12:31 ` Johannes Schindelin
2009-08-17 15:23   ` Shawn O. Pearce
2009-08-18  5:43   ` Matthieu Moy
2009-08-18  6:58     ` Tomasz Kontusz
2009-08-18 17:56       ` Nicolas Pitre
2009-08-18 18:45         ` Jakub Narebski
2009-08-18 20:01           ` Nicolas Pitre
2009-08-18 21:02             ` Jakub Narebski
2009-08-18 21:32               ` Nicolas Pitre
2009-08-19 15:19                 ` Jakub Narebski [this message]
2009-08-19 19:04                   ` Nicolas Pitre
2009-08-19 19:42                     ` Jakub Narebski
2009-08-19 21:13                       ` Nicolas Pitre
2009-08-20  0:26                         ` Sam Vilain
2009-08-20  7:37                         ` Jakub Narebski
2009-08-20  7:48                           ` Nguyen Thai Ngoc Duy
2009-08-20  8:23                             ` Jakub Narebski
2009-08-20 18:41                           ` Nicolas Pitre
2009-08-21 10:07                             ` Jakub Narebski
2009-08-21 10:26                               ` Matthieu Moy
2009-08-21 21:07                               ` Nicolas Pitre
2009-08-21 21:41                                 ` Jakub Narebski
2009-08-22  0:59                                   ` Nicolas Pitre
2009-08-21 23:07                                 ` Sam Vilain
2009-08-22  3:37                                   ` Nicolas Pitre
2009-08-22  5:50                                     ` Sam Vilain
2009-08-22  8:13                                       ` Nicolas Pitre
2009-08-23 10:37                                         ` Sam Vilain
2009-08-20 22:57                           ` Sam Vilain
2009-08-18 22:28             ` Johannes Schindelin
2009-08-18 23:40               ` Nicolas Pitre
2009-08-19  7:35                 ` Johannes Schindelin
2009-08-19  8:25                   ` Nguyen Thai Ngoc Duy
2009-08-19  9:52                     ` Johannes Schindelin
2009-08-19 17:21                   ` Nicolas Pitre
2009-08-19 22:23                     ` René Scharfe
2009-08-19  4:42           ` Sitaram Chamarty
2009-08-19  9:53             ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908191719.52974.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    --cc=roverorna@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).