git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	"Robin H. Johnson" <robbat2@gentoo.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: git-clone --config order & fetching extra refs during initial clone
Date: Thu, 4 May 2017 09:28:12 +0200	[thread overview]
Message-ID: <CACBZZX5706NELxAOWRVAx-QFPtZ_rAsRkTX811+jfrN4u47XfA@mail.gmail.com> (raw)
In-Reply-To: <CAM0VKjnjMEThuMvLEQByxWvxVvdzMSVsFKKstKLMweEx5UwTcg@mail.gmail.com>

On Wed, May 3, 2017 at 4:42 PM, SZEDER Gábor <szeder.dev@gmail.com> wrote:
> Cc'ing Ævar because of his work on 'clone --no-tags'.
>
> On Wed, Mar 15, 2017 at 6:08 PM, Jeff King <peff@peff.net> wrote:
>> On Sat, Mar 11, 2017 at 01:41:34AM +0100, SZEDER Gábor wrote:
>>> > Though if I'm bikeshedding, I'd probably have written the whole thing
>>> > with an argv_array and avoided counting at all.
>>>
>>> Yeah, I did notice that you love argv_array :)  I had to raise an
>>> eyebrow recently while looking at send-pack and how it uses argv_array
>>> to read refspecs from stdin into an array.  I think argv_array feels a
>>> bit out of place in both cases.  Yes, it does exactly what's needed.
>>> However, it's called *argv*_array, indicating that its contents is
>>> destined to become the options of some command.  But that's not the
>>> case in these two cases, we are not dealing with arguments to a
>>> command, these are just arrays of strings.
>>
>> In my mind, "argv" is synonymous with "NULL-terminated array of
>> strings". If the name is the only thing keeping it from wider use, I'd
>> much prefer us to give it a more generic name. All I really care about
>> is simplifying memory management. :)
>
> Whether its name is the _only_ thing keeping it from wider use, I
> don't know :)
>
> All I can say is that I was aware of argv_array, but because of
> its name it didn't even occur to me.  And even if I had considered it,
> I still wouldn't have used it here.  Had it been called string_array,
> I think I would have used it.
>
> On a related note, we have string_list, which is not a list but an
> ALLOC_GROW()-able array, and not that of strings (i.e. plan char*),
> but of structs with a string and an additional data field.
> Oh, well :)
>
>
>>> > I do also notice that right _after_ this parsing, we use remote_get(),
>>> > which is supposed to give us this config anyway. Which makes me wonder
>>> > if we could just reorder this to put remote_get() first, and then read
>>> > the resulting refspecs from remote->fetch.
>>>
>>> Though get_remote() does give us this config, at this point the
>>> default fetch refspec has not been configured yet, therefore it's not
>>> included in the resulting remote->fetch array.  The default refspec is
>>> set in write_refspec_config(), but that's called only about two
>>> screenfulls later.  So there is a bit of extra work to do in order to
>>> leverage get_remote()'s parsing.
>>>
>>> I think the simplest way is to keep parsing the default fetch refspec
>>> manually, and then append it to the remote->fetch array.  Definitely
>>> shorter and simpler than that parsing in the current patch.
>>> Alternatively, we could set the default fetch refspec in the
>>> configuration temporarily, only for the duration of the get_remote()
>>> call, but it feels a bit too hackish...
>>
>> Yeah, I think manually combining the two here is fine. Though I won't
>> complain if you want to look into setting the config earlier. If the
>> refactoring isn't too bad, it would probably provide the nicest outcome.
>
> I did actually look into that, but don't think it's a good idea.
>
> write_refspec_config() nicely encapsulates writing the proper fetch
> refspec configuration according to the given command line options.  Of
> course these options are already known right at the start, so solely
> in this regard we could call this function earlier.  However, in some
> cases, e.g. '--single-branch', the refspec to be written to the config
> depends not only on the given options but on the refs in the remote
> repository, too, so it can only be written after we got the remote's
> refs.
>
>
> Unfortunately, there is more to this issue.  Earlier I though that the
> fetch refspec is the only config that is ignored during a clone.
> However, Ævar's 'clone --no-tags' patches[1] drew my attention to the
> 'remote.<name>.tagOpt' config variable, that I overlooked earlier...
> and apparently 'git clone' overlooks it as well, grabbing all tags
> even when it's set to '--no-tags'.
>
> The root issue is that 'git clone' calls directly into the fetch
> machinery instead of running 'git fetch' (either via run_command() or
> cmd_fetch()), and some of these "higher-level" config variables are
> only handled in 'builtin/fetch.c' but not in 'git clone'.  By
> "handle" I mean "parse and act accordingly": as it happens, these
> config values are parsed alright when 'git clone' calls remote_get(),
> but it never looks at the relevant fields in the resulting 'struct
> remote'.
>
> Luckily, many "lower-level" config variables are working properly even
> during 'git clone', because they are handled in the transport layer,
> e.g. 'git clone -c url.https://github.com/.insteadof=gh: gh:git/git'
> does the right thing.
>
>
> I'm not sure what the right way forward would be.
>
> My patch deals with 'remote.<name>.refspec', i.e. 'remote->fetch'.
> Apparently some extra care is necessary for 'remote.<name>.tagOpt' and
> 'remote->fetch_tags', too.  Perhaps there are more, I haven't checked
> again, and maybe we'll add similar config variables in the future.  So
> I don't think that dealing with such config variables one by one in
> 'git clone', too, is the right long-term solution...  but perhaps it's
> sufficient for the time being?
>
> Running a fully-fledged 'git fetch' seems to be simpler and safer
> conceptually, as it would naturally handle all fetch-related config
> variables, present and future.  However, it's not without drawbacks:
> 'git clone' must set the proper config before running 'git fetch' (or
> at least set equivalent cmdline options), which in some cases requires
> the refs in the remote repository, making an additional "list remote
> refs" step necessary (i.e. both 'clone' and 'fetch' would have to
> retrieve the refs from the remote, resulting in more network I/O).
>
> Or we should libify more of 'builtin/fetch.c', but with all those
> static variables and functions in there...  Ugh :)

Yes from my (limited) understanding of the code after hacking in
--no-tags this all seems correct. I.e. that a large part of my patch
wouldn't be needed if we were able to just set tagOpt earlier & then
call fetch.

But as you point out there's a big chicken & egg problem with the
likes of --single-branch where we'd either need to run upload-pack on
the remote side twice, once to get the branch and once to fetch (ew!).

The way to get around that that I can see would be to have some deep
hooking into the fetch machinery where first we set our config like
tagOpt=--no-tags for --no-tags, then we call `fetch`, but `fetch`
would need to have some hook where in the case of --single-branch it
would immediately write the branch name to the fetch spec in the
config, then do the actual fetching.

      parent reply	other threads:[~2017-05-04  7:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-25 19:12 git-clone --config order & fetching extra refs during initial clone Robin H. Johnson
2017-02-25 20:21 ` Jeff King
2017-02-25 20:50 ` Jeff King
2017-02-27 19:16   ` Junio C Hamano
2017-02-27 21:12     ` Jeff King
2017-03-11  0:41       ` SZEDER Gábor
2017-03-15 17:08         ` Jeff King
2017-05-03 14:42           ` SZEDER Gábor
2017-05-03 20:22             ` Jeff King
2017-05-04  6:57               ` Sebastian Schuberth
2017-05-09  1:33               ` Junio C Hamano
2017-05-09  2:10                 ` Jeff King
2017-05-09  2:26                   ` Jeff King
2017-05-09  2:50                     ` Junio C Hamano
2017-05-04  7:28             ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACBZZX5706NELxAOWRVAx-QFPtZ_rAsRkTX811+jfrN4u47XfA@mail.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=robbat2@gentoo.org \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).