git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [RFC PATCH] fetch-pack: space out sent "haves" in negotiation
Date: Tue, 22 May 2018 11:44:40 -0700	[thread overview]
Message-ID: <20180522114440.0b1aa23a780efe6a68770d9b@google.com> (raw)
In-Reply-To: <CAGZ79kb96Fxf1OBbnqmAtAm_EA5y9+0NKcNqhKjXhavWe6WzWA@mail.gmail.com>

On Mon, 21 May 2018 15:57:18 -0700
Stefan Beller <sbeller@google.com> wrote:

> In an ideal world, the server and client would both estimate the potential
> reduction of the packfile to send, and base the decision if to continue
> negotiating on the trade off if the packfile reduction savings are greater
> than the cost of negotiation (in terms of bandwidth or time).
> (e.g. the server could keep track of the "potential largest packfile to
> sent" as well as the "potential smallest packfile to sent" given the
> state of negotiation. And as soon as the difference between those
> two packs is smaller than the size of one round of negotiation,
> it is better to stop and just sent the large file).
> 
> You state that you do not want to change the server side, and stick to
> the current protocol, which makes this ideal world scenario moot, but
> shifts the problem to "picking haves more intelligently".

Thanks for thinking about this!

This requires a modification on the server side, as you said, but this
sounds like a good idea that can be combined with the approach in my
patch - once we've found a match, instead of the client restarting the
fine walk, the server could then send the hashes of the commits between
its tip and the match (or some subset thereof) and the client can send
the specific hashes it has.

> > I'm not sure if this is the best way,
> 
> I think it is the best for a short term gain, as the picking algorithm is
> not part of the protocol, so it can be easily extended/reverted/improved
> as we go. So I would continue this way.

That's true in that this can be subsequently modified without backwards
incompatibility - the only issue is the opportunity cost of the author
and reviewer's time and effort.

> > (1) The implementation that I have
> >
> > This patch contains some drop-in code that passes all existing tests,
> > but the new negotiation algorithm is not tested.
> >
> > To mitigate the effect of skipping, I included functionality wherein
> > the client will retry the commits in a skip if the server ACKs the
> > destination of the skip, but this is currently imperfect - in
> > particular, the server might end the negotiation early, and the commits
> > retried in my current implementation are a superset due to the fact that
> > I didn't store the commits in the skip.
> 
> So we start with exponential hops, fall back to linear probing and then
> "make off by one errors" in the linear probes?

I wouldn't characterize the errors as "off by one errors". They are
more like...let me use a diagram:

A
|\
B D
| |
C E

Suppose we know that the server does not have A, has C, and may or may
not have E (we sent "have E" but didn't get a response yet). My method
restarts the walk at all the parents of A (that is, B and D), but D is
irrelevant to the situation (and should not be walked over - this is the
error).

> > (3) Other ways of improving negotiation
> >
> > If we're prepared to commit-walk a significant part of the entire local
> > repo (as we are, in the situation I described in the first paragraph),
> 
> > and if we have access to corresponding remote-tracking information,
> 
> This is a dangerous assumption, as not everyone is having a 1:1 relationship
> with their remote server (for e.g. code review), but there are these triangle
> workflows in the kernel community for example, where you push in one
> remote direction and (re-)obtain the history merged into the bigger picture
> from another remote. And these two remotes are not special to each other
> on the client side.

Precisely for this reason (where the local repo could have obtained a
remote's commits through means other than through the remote - in this
case, written by the local repo's user themself) I wanted to include
both ancestors and descendants of the remote tracking tip.

> This patch is moving the algorithm driving the selection of new
> commits to pick to
> a new file, but there is no new algorithm, yet?
> As hinted at from (1), this is smarter than what we did before by
> picking commits
> non-linearly but with some sort of exponential back off, how does it end the
> exponential phase?

In this patch, I wrote the new algorithm and deleted the old one. I
think the answer to your question is that the exponential phase never
ends. If what you mean is what happens when we reach a parentless commit
- we will emit it (that is, send "have X") regardless of how many
commits have been skipped.

> The way forward out of RFC state, might be to separate the introduction of a new
> improved algorithm and the refactoring. So first move code literally into the
> fetch-negotiator file, and then add improvements in there, or is it
> just not worth
> the refactoring and directly put in the new algorithm?

You're proposing that if I proceed with this, I split the patch into 2 -
one to move the negotiation algorithm, and one to update it? If yes,
normally I would agree, but the current negotiation algorithm is not
very sophisticated (and does not take up much code), so I think it's not
worth it.

> Another use case we discussed was "open-ended bisection", where you know
> you are in a bad state and "once upon a time it worked", and now you are tasked
> to find the offending commit. To find such a commit, you probably
> would also start
> with such an exponential back off until you run into a "good frontier"
> of commits
> and then use conventional bisect to narrow down the exact commit.

That's true.

> > +struct fetch_negotiator {
> > +       struct sent_commit **sent_commits;
> > +       size_t sent_commit_nr, sent_commit_alloc;
> > +       struct prio_queue candidates;
> > +};
> 
> Maybe we can just declare the struct fetch_negotiator here and not
> define it, such that nobody outside the actual implementation tries
> to access its internals?

That's possible - I wanted to allow allocation of this on the stack (to
save a malloc), but perhaps that's over-optimization.

> > +/*
> > + * Iterate through the commits invoked with fetch_negotiator_ack. The negotiator
> > + * makes an effort to remove redundant commits from the list.
> > + *
> > + * This is useful for stateless connections, in which information about what the
> > + * client knows needs to be replayed in every request.
> 
> So even if we do not use the skip commit logic, this would be a benefit for any
> http(-v0) and v2 users of the protocol?

It would conserve bandwidth, yes, but storing all the commits sent with
additional metadata for each would require more memory.

  reply	other threads:[~2018-05-22 18:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 20:43 [RFC PATCH] fetch-pack: space out sent "haves" in negotiation Jonathan Tan
2018-05-21 22:57 ` Stefan Beller
2018-05-22 18:44   ` Jonathan Tan [this message]
2018-05-22 19:01     ` Stefan Beller
2018-05-23  1:08 ` Junio C Hamano
2018-05-23  3:42 ` Junio C Hamano
2018-05-29 16:58   ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180522114440.0b1aa23a780efe6a68770d9b@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).