git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Tomasz Kontusz <roverorna@gmail.com>, git <git@vger.kernel.org>
Subject: Re: Continue git clone after interruption
Date: Tue, 18 Aug 2009 17:32:35 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0908181711350.6044@xanadu.home> (raw)
In-Reply-To: <200908182302.10619.jnareb@gmail.com>

On Tue, 18 Aug 2009, Jakub Narebski wrote:

> You can probably get number and size taken by delta and non-delta (base)
> objects in the packfile somehow.  Neither "git verify-pack -v <packfile>"
> nor contrib/stats/packinfo.pl did help me arrive at this data.

Documentation for verify-pack says:

|When specifying the -v option the format used is:
|
|        SHA1 type size size-in-pack-file offset-in-packfile
|
|for objects that are not deltified in the pack, and
|
|        SHA1 type size size-in-packfile offset-in-packfile depth base-SHA1
|
|for objects that are deltified.

So a simple script should be able to give you the answer.

> >> (BTW what happens if this pack is larger than file size limit for 
> >> given filesystem?).
> > 
> > We currently fail.  Seems that no one ever had a problem with that so 
> > far. We'd have to split the pack stream into multiple packs on the 
> > receiving end.  But frankly, if you have a repository large enough to 
> > bust your filesystem's file size limit then maybe you should seriously 
> > reconsider your choice of development environment.
> 
> Do we fail gracefully (with an error message), or does git crash then?

If the filesystem is imposing the limit, it will likely return an error 
on the write() call and we'll die().  If the machine has a too small 
off_t for the received pack then we also die("pack too large for current 
definition of off_t").

> If I remember correctly FAT28^W FAT32 has maximum file size of 2 GB.
> FAT is often used on SSD, on USB drive.  Although if you have  2 GB
> packfile, you are doing something wrong, or UGFWIINI (Using Git For
> What It Is Not Intended).

Hopefully you're not performing a 'git clone' off of a FAT filesystem.  
For physical transport you may repack with the appropriate switches.

> >> If it fails, client ask first for first half of of
> >> repository (half as in bisect, but it is server that has to calculate
> >> it).  If it downloads, it will ask server for the rest of repository.
> >> If it fails, it would reduce size in half again, and ask about 1/4 of
> >> repository in packfile first.
> > 
> > Problem people with slow links have won't be helped at all with this.  
> > What if the network connection gets broken only after 49% of the 
> > transfer and that took 3 hours to download?  You'll attempt a 25% size 
> > transfer which would take 1.5 hour despite the fact that you already 
> > spent that much time downloading that first 1/4 of the repository 
> > already.  And yet what if you're unlucky and now the network craps on 
> > you after 23% of that second attempt?
> 
> A modification then.
> 
> First try ordinary clone.  If it fails because network is unreliable,
> check how much we did download, and ask server for packfile of slightly
> smaller size; this means that we are asking server for approximate pack
> size limit, not for bisect-like partitioning revision list.

If the download didn't reach past the critical point (75 MB in my linux 
repo example) then you cannot validate the received data and you've 
wasted that much bandwidth.

> > I think it is better to "prime" the repository with the content of the 
> > top commit in the most straight forward manner using git-archive which 
> > has the potential to be fully restartable at any point with little 
> > complexity on the server side.
> 
> But didn't it make fully restartable 2.5 MB part out of 37 MB packfile?

The front of the pack is the critical point.  If you get enough to 
create the top commit then further transfers can be done incrementally 
with only the deltas between each commits.

> A question about pack protocol negotiation.  If clients presents some
> objects as "have", server can and does assume that client has all 
> prerequisites for such objects, e.g. for tree objects that it has
> all objects for files and directories inside tree; for commit it means
> all ancestors and all objects in snapshot (have top tree, and its 
> prerequisites).  Do I understand this correctly?

That works only for commits.

> If we have partial packfile which crashed during downloading, can we
> extract from it some full objects (including blobs)?  Can we pass
> tree and blob objects as "have" to server, and is it taken into account?

No.

> Perhaps instead of separate step of resumable-downloading of top commit
> objects (in snapshot), we can pass to server what we did download in
> full?

See above.

> BTW. because of compression it might be more difficult to resume 
> archive creation in the middle, I think...

Why so?  the tar+gzip format is streamable.


Nicolas

  reply	other threads:[~2009-08-18 21:32 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-17 11:42 Continue git clone after interruption Tomasz Kontusz
2009-08-17 12:31 ` Johannes Schindelin
2009-08-17 15:23   ` Shawn O. Pearce
2009-08-18  5:43   ` Matthieu Moy
2009-08-18  6:58     ` Tomasz Kontusz
2009-08-18 17:56       ` Nicolas Pitre
2009-08-18 18:45         ` Jakub Narebski
2009-08-18 20:01           ` Nicolas Pitre
2009-08-18 21:02             ` Jakub Narebski
2009-08-18 21:32               ` Nicolas Pitre [this message]
2009-08-19 15:19                 ` Jakub Narebski
2009-08-19 19:04                   ` Nicolas Pitre
2009-08-19 19:42                     ` Jakub Narebski
2009-08-19 21:13                       ` Nicolas Pitre
2009-08-20  0:26                         ` Sam Vilain
2009-08-20  7:37                         ` Jakub Narebski
2009-08-20  7:48                           ` Nguyen Thai Ngoc Duy
2009-08-20  8:23                             ` Jakub Narebski
2009-08-20 18:41                           ` Nicolas Pitre
2009-08-21 10:07                             ` Jakub Narebski
2009-08-21 10:26                               ` Matthieu Moy
2009-08-21 21:07                               ` Nicolas Pitre
2009-08-21 21:41                                 ` Jakub Narebski
2009-08-22  0:59                                   ` Nicolas Pitre
2009-08-21 23:07                                 ` Sam Vilain
2009-08-22  3:37                                   ` Nicolas Pitre
2009-08-22  5:50                                     ` Sam Vilain
2009-08-22  8:13                                       ` Nicolas Pitre
2009-08-23 10:37                                         ` Sam Vilain
2009-08-20 22:57                           ` Sam Vilain
2009-08-18 22:28             ` Johannes Schindelin
2009-08-18 23:40               ` Nicolas Pitre
2009-08-19  7:35                 ` Johannes Schindelin
2009-08-19  8:25                   ` Nguyen Thai Ngoc Duy
2009-08-19  9:52                     ` Johannes Schindelin
2009-08-19 17:21                   ` Nicolas Pitre
2009-08-19 22:23                     ` René Scharfe
2009-08-19  4:42           ` Sitaram Chamarty
2009-08-19  9:53             ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0908181711350.6044@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=roverorna@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).