git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Shallow submodule efficiency
@ 2016-06-28  5:39 Martin von Gagern
  2016-06-28 17:20 ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Martin von Gagern @ 2016-06-28  5:39 UTC (permalink / raw)
  To: git

Hi!

I have the feeling that “git submodule update --depth 1” is less clever
than it could be. Here is one example I observed with git 2.0.0:

  git init foo
  cd foo
  git clone --single-branch \
            -b v0.99 https://github.com/git/git.git git-scm
  git submodule add https://github.com/git/git.git git-scm
  git commit -m Submod
  git clone --dissociate . ../bar
  cd ../bar
  git submodule update --init --depth 1 git-scm

This will download quite a bit of history, then result in an error message:

  error: no such remote ref a3eb250f996bf5e12376ec88622c4ccaabf20ea8
  Fetched in submodule path 'git-scm', but it did not contain
  a3eb250f996bf5e12376ec88622c4ccaabf20ea8. Direct fetching of that
  commit failed.

That seems so avoidable, since the commit in question is a tag, so it
would be perfectly possible to fetch that specific commit from the
server directly. Something like the following commands would do the trick:

  git fetch $url $(git ls-remote $url | \
                   awk /$sha1/'{print $2}' | sed 's/\^{}//')

If the commit in question is NOT a ref, then whether asking for it by
unlisted SHA1 is supported will probably depend on the server's
uploadpack.allowReachableSHA1InWant setting. I guess this is a reason
why fb43e31 made the fetch for a specific SHA1 a fallback after the
fetch for the default branch. Nevertheless, in case of “--depth 1” I
think it would make sense to abort early: if none of the listed refs
matches the requested one, and asking by SHA1 isn't supported by the
server, then there is no point in fetching anything, since we won't be
able to satisfy the submodule requirement either way.

For the case of “--depth n” with n > 1, I was wondering whether it would
make sense to prefer the branch listed in submodule.‹name›.branch over
the default branch.

I think shallow submodules would be very useful to embed libraries into
projects, without too much care for history (and without the download
times getting it entails), but with efficient updates to affected files
only in case of a change in library version. But not being able to get a
specific tag as a shallow submodule is a major showstopper here, I think.

Greetings,
 Martin von Gagern

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Shallow submodule efficiency
  2016-06-28  5:39 Shallow submodule efficiency Martin von Gagern
@ 2016-06-28 17:20 ` Stefan Beller
  2016-06-28 19:08   ` Martin von Gagern
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Beller @ 2016-06-28 17:20 UTC (permalink / raw)
  To: Martin von Gagern; +Cc: git@vger.kernel.org

On Mon, Jun 27, 2016 at 10:39 PM, Martin von Gagern
<Martin.vGagern@gmx.net> wrote:
> Hi!
>
> I have the feeling that “git submodule update --depth 1” is less clever
> than it could be. Here is one example I observed with git 2.0.0:

2.9.0 (as "Direct fetching of " is not part of 2.0.0 IIRC) ?

>
>   git init foo
>   cd foo
>   git clone --single-branch \
>             -b v0.99 https://github.com/git/git.git git-scm
>   git submodule add https://github.com/git/git.git git-scm
>   git commit -m Submod
>   git clone --dissociate . ../bar
>   cd ../bar
>   git submodule update --init --depth 1 git-scm
>
> This will download quite a bit of history, then result in an error message:
>
>   error: no such remote ref a3eb250f996bf5e12376ec88622c4ccaabf20ea8
>   Fetched in submodule path 'git-scm', but it did not contain
>   a3eb250f996bf5e12376ec88622c4ccaabf20ea8. Direct fetching of that
>   commit failed.

Yeah there are a few things going on, which try to cover up an error
in design IMO.

* The depth is measured from the tip of a branch in the submodule,
   not from the sha1 that the superproject points to.
* Shallowness is treated separately in the superproject and submodules as they
  have a strong notion of being independent. It would be cool to have a thing
  `git clone --recurse-submodules --depth=15
--submodule-depth-as-reachable-from-superproject`
  which would obtain the submodules as shallow as possible, but it
includes all versions that
  the 15 commits in the superproject points to. (may be 1 up to 15
  different non-sequential versions)


>
> That seems so avoidable, since the commit in question is a tag, so it
> would be perfectly possible to fetch that specific commit from the
> server directly. Something like the following commands would do the trick:
>
>   git fetch $url $(git ls-remote $url | \
>                    awk /$sha1/'{print $2}' | sed 's/\^{}//')
>

* `git submodule update --init --depth 1` is using clone instead of fetch
  currently when the submodule doesn't exist yet. The clone is buried in
  the `submodule--helper update-clone` that is a mixture of listing
the submodules
  and cloning multiple submodules in parallel if possible. So I would
assume it is
  easier to teach git clone to behave correctly and then stop retrying
in git-submodule.sh
  if `just_cloned` is set in the `cmd_update()`.

> If the commit in question is NOT a ref, then whether asking for it by
> unlisted SHA1 is supported will probably depend on the server's
> uploadpack.allowReachableSHA1InWant setting. I guess this is a reason
> why fb43e31 made the fetch for a specific SHA1 a fallback after the
> fetch for the default branch. Nevertheless, in case of “--depth 1” I
> think it would make sense to abort early: if none of the listed refs
> matches the requested one, and asking by SHA1 isn't supported by the
> server, then there is no point in fetching anything, since we won't be
> able to satisfy the submodule requirement either way.

Makes sense! I think the easiest way forward to implement this will be:

* `git clone` learns a (maybe undocumented internal) option `--get-sha1`
  `--branch` looks similar to what we want, but doesn't quite fit as we do not
  know, whether we're on a tag or not. The submodule tells us just the
  recorded sha1, not the branch/tag. So maybe we'd end up calling it
  `--detach-at=<sha1>`, that will
  -> inspect the ls-remote for the sha1 being there
  -> if the sha1 is there (at least once) clone as if --branch <tag> was given
  -> if not found and the server advertised  allowReachableSHA1InWant,
try again inside the clone

* `submodule--helper update-clone` passes the  `--get-sha1` to the
clones of the submodules

* cmd_update() in git-submodule.sh will only checkout submodules and
not try again
  to fetch them if `just_cloned` is set as the cloning did the best it could.


>
> For the case of “--depth n” with n > 1, I was wondering whether it would
> make sense to prefer the branch listed in submodule.‹name›.branch over
> the default branch.

Makes sense to me.

>
> I think shallow submodules would be very useful to embed libraries into
> projects, without too much care for history (and without the download
> times getting it entails), but with efficient updates to affected files
> only in case of a change in library version. But not being able to get a
> specific tag as a shallow submodule is a major showstopper here, I think.

Thanks for taking your time to point this out and start this discussion!

Thanks,
Stefan

>
> Greetings,
>  Martin von Gagern
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Shallow submodule efficiency
  2016-06-28 17:20 ` Stefan Beller
@ 2016-06-28 19:08   ` Martin von Gagern
  2016-06-28 19:56     ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Martin von Gagern @ 2016-06-28 19:08 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org

Hi Stefan,

On 28.06.2016 19:20, Stefan Beller wrote:
>> I have the feeling that “git submodule update --depth 1” is less clever
>> than it could be. Here is one example I observed with git 2.0.0:
> 
> 2.9.0 (as "Direct fetching of " is not part of 2.0.0 IIRC) ?

Yes, sorry. I had this tested with 2.8.3 at first, then waited for my
update to 2.9.0 to reproduce, and garbled the text while adjusting it.

> Makes sense! I think the easiest way forward to implement this will be:
> 
> * `git clone` learns a (maybe undocumented internal) option `--get-sha1`
>   `--branch` looks similar to what we want, but doesn't quite fit as we do not
>   know, whether we're on a tag or not. The submodule tells us just the
>   recorded sha1, not the branch/tag. So maybe we'd end up calling it
>   `--detach-at=<sha1>`,

That name makes a lot of sense to me.

>   that will
>   -> inspect the ls-remote for the sha1 being there
>   -> if the sha1 is there (at least once) clone as if --branch <tag> was given

Clone but detach, to be consistent. Yes.

>   -> if not found and the server advertised  allowReachableSHA1InWant,
> try again inside the clone

All of this has to pass through transport and get-pack, right?

> * `submodule--helper update-clone` passes the  `--get-sha1` to the
> clones of the submodules
> 
> * cmd_update() in git-submodule.sh will only checkout submodules and
> not try again
>   to fetch them if `just_cloned` is set as the cloning did the best it could.

Sounds like a very reasonable roadmap to me.

Do you think there will be someone volunteering to tackle this?

Greetings,
  Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Shallow submodule efficiency
  2016-06-28 19:08   ` Martin von Gagern
@ 2016-06-28 19:56     ` Stefan Beller
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Beller @ 2016-06-28 19:56 UTC (permalink / raw)
  To: Martin von Gagern, Duy Nguyen; +Cc: git@vger.kernel.org

On Tue, Jun 28, 2016 at 12:08 PM, Martin von Gagern
<Martin.vGagern@gmx.net> wrote:
> Hi Stefan,
>
> On 28.06.2016 19:20, Stefan Beller wrote:
>>> I have the feeling that “git submodule update --depth 1” is less clever
>>> than it could be. Here is one example I observed with git 2.0.0:
>>
>> 2.9.0 (as "Direct fetching of " is not part of 2.0.0 IIRC) ?
>
> Yes, sorry. I had this tested with 2.8.3 at first, then waited for my
> update to 2.9.0 to reproduce, and garbled the text while adjusting it.
>
>> Makes sense! I think the easiest way forward to implement this will be:
>>
>> * `git clone` learns a (maybe undocumented internal) option `--get-sha1`
>>   `--branch` looks similar to what we want, but doesn't quite fit as we do not
>>   know, whether we're on a tag or not. The submodule tells us just the
>>   recorded sha1, not the branch/tag. So maybe we'd end up calling it
>>   `--detach-at=<sha1>`,
>
> That name makes a lot of sense to me.
>
>>   that will
>>   -> inspect the ls-remote for the sha1 being there
>>   -> if the sha1 is there (at least once) clone as if --branch <tag> was given
>
> Clone but detach, to be consistent. Yes.

Oh, right.

>
>>   -> if not found and the server advertised  allowReachableSHA1InWant,
>> try again inside the clone
>
> All of this has to pass through transport and get-pack, right?

Yeah we have to go through the transport layer, i.e. from builtin/clone.c we
manipulate the transport object as defined in transport.h and code in
transport.c
What do you mean by get-pack, though?

>
>> * `submodule--helper update-clone` passes the  `--get-sha1` to the
>> clones of the submodules
>>
>> * cmd_update() in git-submodule.sh will only checkout submodules and
>> not try again
>>   to fetch them if `just_cloned` is set as the cloning did the best it could.
>
> Sounds like a very reasonable roadmap to me.
>
> Do you think there will be someone volunteering to tackle this?

I have it on my (low priority) TODO list, so if you want it sooner
than later, go for
it yourself. Otherwise just wait. Maybe Duy has some interest as well.

Thanks,
Stefan

>
> Greetings,
>   Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-06-28 20:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-28  5:39 Shallow submodule efficiency Martin von Gagern
2016-06-28 17:20 ` Stefan Beller
2016-06-28 19:08   ` Martin von Gagern
2016-06-28 19:56     ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).