git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Khalid Masum <khalid.masum.92@gmail.com>, Git List <git@vger.kernel.org>
Subject: Re: [GSoC23] Working on project Idea from SOC 2011
Date: Thu, 9 Mar 2023 09:26:04 -0500	[thread overview]
Message-ID: <7097d1d6-00a1-2a82-1923-610d41f4053f@github.com> (raw)
In-Reply-To: <CAABMjtGXGZtUnU+8KgEccNeLXRJmWnE5f24BMG8ysbZKfT-ctQ@mail.gmail.com>

On 3/9/23 7:07 AM, Khalid Masum wrote:
> There is this SOC 2011 idea named "Resumable clone" here:
> 
> https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/SoC2011Ideas.html
> 
> ...
> Currently cloning a remote repository has to be done in one session.
> If the process fails or is aborted for any reason any already downloaded
> data is lost and one has to start from scratch.

> Goal: Allow Git to resume a cloning process that
> has been aborted for any reason.
> Languages: C

"for any reason" is going to be pretty difficult.
 
One direction that is relatively new in the Git project
(much newer than that project idea) is the bundle URI
standard, allowed by "git clone --bundle-uri=<X>". It
helps bootstrap clones by fetching bundle files and using
them to populate the object directory before finishing
the clone with an incremental fetch to the origin server.

Since the bundles are expected to be precomputed files,
it is much easier to use standard HTTP range queries to
download only the "missing" portion of the file from the
bundle server.

I think one thing that would need to change on the Git
client is the location of the temporary file being used
to store the bundle as it is downloaded. It currently
uses a random name, but if the name was a hash of the
URL, then it would be predictable and could restart the
download if the 'git clone' process was halted for any
reason. (Resuming a download due to a network error
noticed in-process is possibly simpler.)

This might be a more focused approach that is more
likely to have progress in a GSoC project.

That said, I don't have the capacity to be a mentor,
but I thought it worth mentioning this variant of the
project.

Thanks,
-Stolee

  parent reply	other threads:[~2023-03-09 14:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-09 12:07 [GSoC23] Working on project Idea from SOC 2011 Khalid Masum
2023-03-09 13:23 ` Christian Couder
2023-03-10  5:39   ` Khalid Masum
2023-03-09 14:26 ` Derrick Stolee [this message]
2023-03-10  5:46   ` Khalid Masum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7097d1d6-00a1-2a82-1923-610d41f4053f@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=khalid.masum.92@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).