git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 14/14] clone: give advice on how to resume a failed clone
Date: Fri, 11 Nov 2011 15:52:56 -0500	[thread overview]
Message-ID: <20111111205256.GB20515@sigill.intra.peff.net> (raw)
In-Reply-To: <7vvcqrsmfx.fsf@alter.siamese.dyndns.org>

On Thu, Nov 10, 2011 at 01:21:38PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We could make "git clone ..." automatically resume, but I'm a little
> > nervous about that. I wrote a patch that did so, and it did work, but
> > there are a lot of little hiccups as we violate the assumption that the
> > directory didn't already exist (e.g., it writes multiple fetch refspec
> > lines to the config).
> 
> Sorry, but I do not think the worry is quite justified.
> 
> The "assumption that directory didn't already exist" becomes an issue only
> if you implement your "git clone" that automatically resumes as a thin
> wrapper around the current "git clone" in the form of
> 
>     until git clone ...
>     do
> 	echo retrying...
>     done

That was sort of what my patch looked like. It didn't do the wrapper
bit; you would have to run "git clone" again to resume. I.e.:

  $ git clone http://...
  Downloading bundle: ...
  ^C
  $ git clone http://...
  Resuming bundle: ...

and the implementation was very minimal. Basically, instead of checking
that the directory exists and dying, it said "the directory exists, but
there is a resumable bundle in it, so let's keep going anyway".

You _could_ do that right, with clone saying "I got to the fetch stage",
and picking up there. But that means picking apart clone into its
various stages, and not repeating earlier stages (or making sure they're
idempotent).  And making sure that internally, later stages don't depend
on in-memory actions from the earlier stages.

And then what happens when you have different parameters? If I say "git
clone foo", and then resume with "git clone --mirror foo", what should
happen?

None of these problems are insurmountable. But it means looking over the
clone code carefully, figuring out what should happen, and then probably
breaking apart the various stages to see where we can resume from.

I wanted to start simply, and just tell the user "this is approximately
what clone would have done from here". And then the fancier automatic
bits can come later.

> Stepping back a bit, I think there are two different situations where
> resumable clone is beneficial. The "git clone" process died either by the
> machine crashing or the user hitting a \C-c, or the connection between the
> server and the "git clone" got severed for some reason.
> 
> Right now, the "got disconnected" case results in "git clone" voluntarily
> dying and as the result of this, the symptom appears the same.  But it
> does not have to be that way.  If we know the underlying transport is
> resumable, e.g. when you are fetching a prepared bundle over the wire.
>
> I have this suspicion that in practice the "got disconnected" case is the
> majority. If "git clone" does not die upon disconnect while fetching a
> bundle, but instead the fetching of the bundle is resumed internally by
> reconnecting to the server and requesting a range transfer, there is no
> risk of "writes multiple fetch refspec lines" etc, no?

I don't think it is the majority. And there are even variants of "git
disconnected" that don't work. Here are just a few cases it wouldn't
handle that I think are common:

  1. the client machine crashes or loses power; you'd like to resume
     after rebooting.

  2. the network or server goes out, but is not immediately available.
     The followup attempt to fetch fails, but you could succeed if you
     restarted the fetch a few minutes (or hours, or days) later.

  3. the user hits ^C not because they want to abort the clone entirely,
     but because they know they cannot complete the clone right now
     (e.g., they are taking their laptop off the network, or it is
     consuming too much bandwidth, or they would rather wait until later
     when they are on a faster network).

All of those mean we have to have some on-disk state that shows how far
we got, and that a totally separate process can figure out from the
state where to resume.

> Of course, it is _also_ beneficial if we made "git clone" resumable after
> you purposefully kill it (maybe you thought it will clone within minutes,
> but it turns out that it may take hours and you have to turn off the
> machine in the next five minutes before leaving the work, or something).
> A solution for that case _could_ be used for the "got disconnected" case
> by letting it voluntarily die as we currently do, but I do not think that
> is an optimal solution to the "got disconnected" case.

I see the "got disconnected, and we can immediately resume" as a
minority subset of the larger problem. If we really want to, we can
implement that while waiting for a larger solution, but I don't think it
will serve most people's needs, and then will eventually become obsolete
anyway.

-Peff

  reply	other threads:[~2011-11-11 20:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-10  7:43 [PATCH 0/14] resumable network bundles Jeff King
2011-11-10  7:45 ` Jeff King
2011-11-10  7:46 ` [PATCH 01/14] t/lib-httpd: check for NO_CURL Jeff King
2011-11-10  7:48 ` [PATCH 02/14] http: turn off curl signals Jeff King
2011-11-10  8:43   ` Daniel Stenberg
2011-11-11 20:54     ` Jeff King
2011-11-10  7:48 ` [PATCH 03/14] http: refactor http_request function Jeff King
2011-11-10  7:49 ` [PATCH 04/14] http: add a public function for arbitrary-callback request Jeff King
2011-11-10  7:49 ` [PATCH 05/14] remote-curl: use http callback for requesting refs Jeff King
2011-11-10  7:49 ` [PATCH 06/14] transport: factor out bundle to ref list conversion Jeff King
2011-11-10  7:50 ` [PATCH 07/14] bundle: add is_bundle_buf helper Jeff King
2011-11-10  7:50 ` [PATCH 08/14] remote-curl: free "discovery" object Jeff King
2011-11-10  7:50 ` [PATCH 09/14] remote-curl: auto-detect bundles when fetching refs Jeff King
2011-11-10  7:51 ` [PATCH 10/14] remote-curl: try base $URL after $URL/info/refs Jeff King
2011-11-10  7:53 ` [PATCH 11/14] progress: allow pure-throughput progress meters Jeff King
2011-11-10  7:53 ` [PATCH 12/14] remote-curl: show progress for bundle downloads Jeff King
2011-11-10  7:53 ` [PATCH 13/14] remote-curl: resume interrupted bundle transfers Jeff King
2011-11-10  7:56 ` [PATCH 14/14] clone: give advice on how to resume a failed clone Jeff King
2011-11-10 21:21   ` Junio C Hamano
2011-11-11 20:52     ` Jeff King [this message]
2011-11-11 13:13 ` [PATCH 0/14] resumable network bundles David Michael Barr
2011-11-12 16:11 ` Tay Ray Chuan
2011-11-12 17:58   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111111205256.GB20515@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).