git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Kirill Smelkov <kirr@nexedi.com>
To: Jeff King <peff@peff.net>
Cc: Eric Sunshine <sunshine@sunshineco.com>,
	Junio C Hamano <gitster@pobox.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Brandon Williams <bmwill@google.com>,
	Takuto Ikuta <tikuta@chromium.org>,
	Jeff Hostetler <jeffhost@microsoft.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Git List <git@vger.kernel.org>
Subject: Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
Date: Tue, 12 Jun 2018 18:54:17 +0000	[thread overview]
Message-ID: <20180612185413.GA21856@deco.navytux.spb.ru> (raw)
In-Reply-To: <20180612094849.GB26123@sigill.intra.peff.net>

On Tue, Jun 12, 2018 at 05:48:49AM -0400, Jeff King wrote:
> On Mon, Jun 11, 2018 at 09:43:02AM +0000, Kirill Smelkov wrote:
> 
> > > Looking deeper, we do not need these trees and blobs at all. The problem
> > > is really just a tag that peels to an object that is not otherwise a ref
> > > tip, regardless of its type.
> > 
> > Thanks for feedback and for coming up with the fix. Sure, I'm ok with
> > moving the test into your patch. However, even if a test becomes
> > different - narrowing down root of _current_ problem, I suggest to also
> > keep explicitly testing tag-to-blob and tag-to-tree (and if we really
> > also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip
> > those now, they can potentially break in the future.
> 
> Yeah, I have no problem testing these cases separately. There's no bug
> with them now, but it is a slightly uncommon case. My suggestion would
> be to submit a patch that goes on top of mine that covers these cases.

Ok, I will try to do it.


> > I would also suggest to fix upload-pack, as it is just not consistent to
> > reject sending objects that were advertised, and so can strike again
> > some way in the future. After all git.git's fetch-pack is not the only
> > git client that should be possible to interact with git.git's
> > upload-pack on remote side, right?
> 
> No, it's not the only client. At the same time, I am on the fence over
> whether upload-pack's behavior is wrong or not. It depends what you take
> a peeled advertisement line to mean. Does it mean: this object has been
> advertised and clients should be able to fetch it? Or does it mean: by
> the way, you may be interested to know the peeled value of this tag in
> case you want to do tag-following?
> 
> So far I think it has only meant the latter. I could see an argument for
> the former, but any client depending on that would never have worked,
> AFAICT. We could _make_ it work, but how would a client know which
> server version it's talking to (and therefore whether it is safe to make
> the request?). I think you'd have to add a capability to negotiate.

I see. I don't know the details of the exchange, just it was surprising
for outside observer that fetching what was advertised is rejected. For
the reference there is no strong need for me for this to work anymore
(please see below).


> > I'm not sure, but I would say that `fetch-pack --all` from an empty
> > repository should not fail and should just give empty output as fetch
> > does.
> 
> Yeah, that seems reasonable to me. The die() that catches this dates
> back to 2005-era, and we later taught the "fetch" porcelain to handle
> this. I don't _think_ anybody would be upset that the plumbing learned
> to treat this as a noop. It's probably a one-liner change in
> fetch_pack() to return early instead of dying.

Ok, I will try to send related testcase, and it is indeed easy to find
- the fix itself.


> > For the reference all the cases presented here are real - they appear in
> > our repositories on lab.nexedi.com for which I maintain the backup, and
> > I've noticed them in the process of switching git-backup from using
> > fetch to fetch-pack here:
> > 
> > https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436
> 
> I applaud you using the porcelain for your scripts, but I suspect that
> fetch-pack by itself is not at all well-used or well-tested these days
> (certainly this --all bug has been around for almost 6 years and is not
> very hard to trigger in practice).

I see; thanks for the warning.


> If an extra connection isn't a problem, you might be better off with
> "git ls-remote", and then picking through the results for refs of
> interest, and then "git fetch-pack" to actually get the pack. That's how
> git-fetch worked when it was a shell script (e.g., see c3a200120d, the
> last shell version).

Yes, this is what I ended up doing:

https://lab.nexedi.com/kirr/git-backup/commit/899103bf

but for another reason - to avoid repeating for every fetched repository
slow (in case of my "big" destination backup repository) quickfetch()
checking in every spawned `git fetch`: git-backup can build index of
objects we already have ourselves only once at startup, and then in
fetch, after checking lsremote output, consult that index, and if we see
we already have everything for an advertised reference - just avoid
giving it to fetch-pack to process. It turns out for many pulled
repositories there is usually no references changed at all and this way
fetch-pack can be skipped completely:

https://lab.nexedi.com/kirr/git-backup/commit/3efed898

> It may also be sane to just use "git fetch", which I'd say is _fairly_
> safe to script. Of course I have no problem if you want to fix all of
> the corner cases in fetch-pack. Just giving you fair warning. :)

Thanks again for the warning. I'm happy the switch to fetch plumbing
happenned on my side, and so far it is working well. Like I said above I
cannot use `git fetch` as is, because quickfetch() overhead for my case
became dominant and very slow, taking ~ 30 seconds of the time just to
check whether we have everything from one fetched repository, which, if
there are 100x or 1000x of such, adds up to hours.

If ls-remote has to be used anyway switching to plumbing seems natural.
Let's see if I hit any more corner place or not - I will be keeping your
warning in mind.

Thanks again,
Kirill

  reply	other threads:[~2018-06-12 19:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-10 14:32 [PATCH] fetch-pack: demonstrate --all breakage when remote have tags to non-commit objects Kirill Smelkov
2018-06-11  4:20 ` Jeff King
2018-06-11  4:47   ` [PATCH] fetch-pack: don't try to fetch peeled values with --all Jeff King
2018-06-11  5:28     ` Eric Sunshine
2018-06-11  5:53       ` [PATCH v2] fetch-pack: don't try to fetch peel " Jeff King
2018-06-11  9:43         ` Kirill Smelkov
2018-06-12  9:48           ` Jeff King
2018-06-12 18:54             ` Kirill Smelkov [this message]
2018-06-13 11:18               ` [PATCH] fetch-pack: test explicitly that --all can fetch tag references pointing to non-commits Kirill Smelkov
2018-06-13 17:42                 ` Junio C Hamano
2018-06-13 18:43                   ` Kirill Smelkov
2018-06-13 21:05                     ` Jeff King
2018-06-13 23:11                       ` Jeff King
2018-06-14  5:25                         ` Kirill Smelkov
2018-06-14 16:07                           ` Junio C Hamano
2018-06-14 17:51                             ` Kirill Smelkov
2018-06-13 12:55               ` [PATCH] fetch-pack: demonstrate --all failure when remote is empty Kirill Smelkov
2018-06-13 17:13                 ` Junio C Hamano
2018-06-13 18:21                   ` Kirill Smelkov
2018-06-13 21:13               ` [PATCH v2] fetch-pack: don't try to fetch peel values with --all Jeff King
2018-06-14  5:29                 ` Kirill Smelkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180612185413.GA21856@deco.navytux.spb.ru \
    --to=kirr@nexedi.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    --cc=tikuta@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).