git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC23] Working on project Idea from SOC 2011
@ 2023-03-09 12:07 Khalid Masum
  2023-03-09 13:23 ` Christian Couder
  2023-03-09 14:26 ` Derrick Stolee
  0 siblings, 2 replies; 5+ messages in thread
From: Khalid Masum @ 2023-03-09 12:07 UTC (permalink / raw)
  To: Git List

There is this SOC 2011 idea named "Resumable clone" here:

https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/SoC2011Ideas.html

...
Currently cloning a remote repository has to be done in one session.
If the process fails or is aborted for any reason any already downloaded
data is lost and one has to start from scratch.


There is also currently a bug where, after successfully loading
all data during cloning, an failure in applying the data
to the working directory leaves the repository in some
unusable state. In this a normal clone
behaves differently
than a clone --no-checkout followed by checkout.
Fixing this bug would also be part of this project.


While not necessarily being part of this project fetch
might also benefit from a resume mechanism.

Goal: Allow Git to resume a cloning process that
has been aborted for any reason.
Languages: C
...

Can I work on this idea for GSoC23? If so how should I get started?
I have completed one of the microprojects by the way.

Thanks,
-- Khalid Masum

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC23] Working on project Idea from SOC 2011
  2023-03-09 12:07 [GSoC23] Working on project Idea from SOC 2011 Khalid Masum
@ 2023-03-09 13:23 ` Christian Couder
  2023-03-10  5:39   ` Khalid Masum
  2023-03-09 14:26 ` Derrick Stolee
  1 sibling, 1 reply; 5+ messages in thread
From: Christian Couder @ 2023-03-09 13:23 UTC (permalink / raw)
  To: Khalid Masum; +Cc: Git List

Hi,

On Thu, Mar 9, 2023 at 1:15 PM Khalid Masum <khalid.masum.92@gmail.com> wrote:
>
> There is this SOC 2011 idea named "Resumable clone" here:
>
> https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/SoC2011Ideas.html

[...]

> Goal: Allow Git to resume a cloning process that
> has been aborted for any reason.
> Languages: C

> Can I work on this idea for GSoC23?

You would need to find (co-)mentors willing to mentor you on this project.

I think we don't propose this kind of project anymore as we think they
are too difficult. Some reasons are explained in the "Note about
refactoring projects versus projects that implement new features" in
https://git.github.io/General-Application-Information/

> If so how should I get started?

See the section I just mentioned. There is "an applicant proposing
something original must engage with the community strongly before and
during the application period to get feedback and guidance to improve
the proposal and avoid the above potential issues".

> I have completed one of the microprojects by the way.

Great, thanks for your interest in working on Git!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC23] Working on project Idea from SOC 2011
  2023-03-09 12:07 [GSoC23] Working on project Idea from SOC 2011 Khalid Masum
  2023-03-09 13:23 ` Christian Couder
@ 2023-03-09 14:26 ` Derrick Stolee
  2023-03-10  5:46   ` Khalid Masum
  1 sibling, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2023-03-09 14:26 UTC (permalink / raw)
  To: Khalid Masum, Git List

On 3/9/23 7:07 AM, Khalid Masum wrote:
> There is this SOC 2011 idea named "Resumable clone" here:
> 
> https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/SoC2011Ideas.html
> 
> ...
> Currently cloning a remote repository has to be done in one session.
> If the process fails or is aborted for any reason any already downloaded
> data is lost and one has to start from scratch.

> Goal: Allow Git to resume a cloning process that
> has been aborted for any reason.
> Languages: C

"for any reason" is going to be pretty difficult.
 
One direction that is relatively new in the Git project
(much newer than that project idea) is the bundle URI
standard, allowed by "git clone --bundle-uri=<X>". It
helps bootstrap clones by fetching bundle files and using
them to populate the object directory before finishing
the clone with an incremental fetch to the origin server.

Since the bundles are expected to be precomputed files,
it is much easier to use standard HTTP range queries to
download only the "missing" portion of the file from the
bundle server.

I think one thing that would need to change on the Git
client is the location of the temporary file being used
to store the bundle as it is downloaded. It currently
uses a random name, but if the name was a hash of the
URL, then it would be predictable and could restart the
download if the 'git clone' process was halted for any
reason. (Resuming a download due to a network error
noticed in-process is possibly simpler.)

This might be a more focused approach that is more
likely to have progress in a GSoC project.

That said, I don't have the capacity to be a mentor,
but I thought it worth mentioning this variant of the
project.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC23] Working on project Idea from SOC 2011
  2023-03-09 13:23 ` Christian Couder
@ 2023-03-10  5:39   ` Khalid Masum
  0 siblings, 0 replies; 5+ messages in thread
From: Khalid Masum @ 2023-03-10  5:39 UTC (permalink / raw)
  To: Christian Couder; +Cc: Git List

On Thu, Mar 9, 2023 at 7:23 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> Hi,
>
> On Thu, Mar 9, 2023 at 1:15 PM Khalid Masum <khalid.masum.92@gmail.com> wrote:
> >
> > There is this SOC 2011 idea named "Resumable clone" here:
> >
> > https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/SoC2011Ideas.html
>
> [...]
>
> > Goal: Allow Git to resume a cloning process that
> > has been aborted for any reason.
> > Languages: C
>
> > Can I work on this idea for GSoC23?
>
> You would need to find (co-)mentors willing to mentor you on this project.
>
> I think we don't propose this kind of project anymore as we think they
> are too difficult. Some reasons are explained in the "Note about
> refactoring projects versus projects that implement new features" in
> https://git.github.io/General-Application-Information/
>
> > If so how should I get started?
>
> See the section I just mentioned. There is "an applicant proposing
> something original must engage with the community strongly before and
> during the application period to get feedback and guidance to improve
> the proposal and avoid the above potential issues".
>
> > I have completed one of the microprojects by the way.
>
> Great, thanks for your interest in working on Git!
Thanks, I shall try it.
  -- Khalid Masum

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GSoC23] Working on project Idea from SOC 2011
  2023-03-09 14:26 ` Derrick Stolee
@ 2023-03-10  5:46   ` Khalid Masum
  0 siblings, 0 replies; 5+ messages in thread
From: Khalid Masum @ 2023-03-10  5:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git List

On Thu, Mar 9, 2023 at 8:26 PM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 3/9/23 7:07 AM, Khalid Masum wrote:
> > There is this SOC 2011 idea named "Resumable clone" here:
> >
> > https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/SoC2011Ideas.html
> >
> > ...
> > Currently cloning a remote repository has to be done in one session.
> > If the process fails or is aborted for any reason any already downloaded
> > data is lost and one has to start from scratch.
>
> > Goal: Allow Git to resume a cloning process that
> > has been aborted for any reason.
> > Languages: C
>
> "for any reason" is going to be pretty difficult.
>
> One direction that is relatively new in the Git project
> (much newer than that project idea) is the bundle URI
> standard, allowed by "git clone --bundle-uri=<X>". It
> helps bootstrap clones by fetching bundle files and using
> them to populate the object directory before finishing
> the clone with an incremental fetch to the origin server.
>
> Since the bundles are expected to be precomputed files,
> it is much easier to use standard HTTP range queries to
> download only the "missing" portion of the file from the
> bundle server.
>
> I think one thing that would need to change on the Git
> client is the location of the temporary file being used
> to store the bundle as it is downloaded. It currently
> uses a random name, but if the name was a hash of the
> URL, then it would be predictable and could restart the
> download if the 'git clone' process was halted for any
> reason. (Resuming a download due to a network error
> noticed in-process is possibly simpler.)
>
> This might be a more focused approach that is more
> likely to have progress in a GSoC project.
>
> That said, I don't have the capacity to be a mentor,
> but I thought it worth mentioning this variant of the
> project.
>
> Thanks,
> -Stolee

Thanks for your insight. I will write a proposal based on this and
hopefully get a mentor, while trying to understand how git bundle
cloning works.

thanks,
  -- Khalid Masum

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-10  5:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09 12:07 [GSoC23] Working on project Idea from SOC 2011 Khalid Masum
2023-03-09 13:23 ` Christian Couder
2023-03-10  5:39   ` Khalid Masum
2023-03-09 14:26 ` Derrick Stolee
2023-03-10  5:46   ` Khalid Masum

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).