git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
@ 2017-01-25 22:02 Jonathan Tan
  2017-01-26 22:15 ` Stefan Beller
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jonathan Tan @ 2017-01-25 22:02 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Hello everyone - this is a proposal for a protocol change to allow the
fetch-pack/upload-pack to converse in terms of ref names (globs
allowed), and also an implementation of the server (upload-pack) and
fetch-from-HTTP client (fetch-pack invoked through fetch).

Negotiation currently happens by upload-pack initially sending a list of
refs with names and SHA-1 hashes, and then several request/response
pairs in which the request from fetch-pack consists of SHA-1 hashes
(selected from the initial list). Allowing the request to consist of
names instead of SHA-1 hashes increases tolerance to refs changing
(due to time, and due to having load-balanced servers without strong
consistency), and is a step towards eliminating the need for the server
to send the list of refs first (possibly improving performance).

The protocol is extended by allowing fetch-pack to send
"want-ref <name>", where <name> is a full name (refs/...) [1], possibly
including glob characters. Upload-pack will inform the client of the
refs actually matched by sending "wanted-ref <SHA-1> <name>" before
sending the last ACK or NAK.

This patch set is laid out in this way:
 1-2:
  Upload-pack, protocol documentation, tests that test upload-pack
  independently. A configuration option is added to control if the
  "ref-in-want" capability is advertised. (It is always supported even
  if not advertised - this is so that this feature in multiple
  load-balanced servers can be switched on or off without needing any
  atomic switching.)
 3:
  Mechanism to test a repo that changes over the negotiation (currently,
  only with the existing mechanism).
 4-9:
  The current transport mechanism takes in an array of refs which is
  used as both input and output. Since we are planning to extend the
  transport mechanism to also allow the specification of ref names
  (which may include globs, and thus do not have a 1-1 correspondence to
  refs), refactor to make this parameter to be solely an input
  parameter.
 10-11:
  Changes to fetch-pack (which is used by remote-curl) to support
  "want-ref".
 12-13:
  Changes to the rest (fetch -> transport -> transport-helper ->
  remote-curl) to support "want-ref" when fetching from HTTP. The
  transport fetch function signature has been widened to allow passing
  in ref names - transports may use those ref names instead of or in
  addition to refs if they support it. (I chose to preserve refs in the
  function signature because many parts of Git, including remote
  helpers, only understand the old functionality anyway, and because
  precalculating the refs allows some optimizations.)
 14:
  This is not meant for submission - this is just to show that the tests
  pass if ref-in-want was advertised by default (except for some newly
  added tests that explicitly check for the old behavior).

[1] There has been some discussion about whether the server should
accept partial ref names, e.g. [2]. In this patch set, I have made the
server only accept full names, and it is the responsibility of the
client to send the multiple patterns which it wants to match. Quoting
from the commit message of the second patch:

  For example, a client could reasonably expand an abbreviated
  name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
  refs/tags/foo", among others, and ensure that at least one such ref has
  been fetched.

[2] <20161024132932.i42rqn2vlpocqmkq@sigill.intra.peff.net>

Jonathan Tan (14):
  upload-pack: move parsing of "want" line
  upload-pack: allow ref name and glob requests
  upload-pack: test negotiation with changing repo
  fetch: refactor the population of hashes
  fetch: refactor fetch_refs into two functions
  fetch: refactor to make function args narrower
  fetch-pack: put shallow info in out param
  fetch-pack: check returned refs for matches
  transport: put ref oid in out param
  fetch-pack: support partial names and globs
  fetch-pack: support want-ref
  fetch-pack: do not printf after closing stdout
  fetch: send want-ref and receive fetched refs
  DONT USE advertise_ref_in_want=1

 Documentation/technical/http-protocol.txt         |  20 +-
 Documentation/technical/pack-protocol.txt         |  24 +-
 Documentation/technical/protocol-capabilities.txt |   6 +
 builtin/clone.c                                   |  16 +-
 builtin/fetch-pack.c                              |  64 ++--
 builtin/fetch.c                                   | 178 +++++++---
 fetch-pack.c                                      | 226 +++++++++----
 fetch-pack.h                                      |   6 +-
 remote-curl.c                                     |  91 +++--
 remote.c                                          |  67 +++-
 remote.h                                          |  20 +-
 t/lib-httpd.sh                                    |   1 +
 t/lib-httpd/apache.conf                           |   8 +
 t/lib-httpd/one-time-sed.sh                       |   8 +
 t/t5500-fetch-pack.sh                             |  82 +++++
 t/t5536-fetch-conflicts.sh                        |   2 +
 t/t5552-upload-pack-ref-in-want.sh                | 386 ++++++++++++++++++++++
 transport-helper.c                                | 105 ++++--
 transport.c                                       |  40 ++-
 transport.h                                       |  23 +-
 upload-pack.c                                     | 117 +++++--
 21 files changed, 1232 insertions(+), 258 deletions(-)
 create mode 100644 t/lib-httpd/one-time-sed.sh
 create mode 100755 t/t5552-upload-pack-ref-in-want.sh

-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
  2017-01-25 22:02 Jonathan Tan
@ 2017-01-26 22:15 ` Stefan Beller
  2017-01-26 23:00 ` Jeff King
  2017-02-07 23:53 ` Jonathan Tan
  2 siblings, 0 replies; 7+ messages in thread
From: Stefan Beller @ 2017-01-26 22:15 UTC (permalink / raw)
  To: Jonathan Tan, Jeff King; +Cc: git@vger.kernel.org

On Wed, Jan 25, 2017 at 2:02 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Hello everyone - this is a proposal for a protocol change to allow the
> fetch-pack/upload-pack to converse in terms of ref names (globs
> allowed), and also an implementation of the server (upload-pack) and
> fetch-from-HTTP client (fetch-pack invoked through fetch).
>
> Negotiation currently happens by upload-pack initially sending a list of
> refs with names and SHA-1 hashes, and then several request/response
> pairs in which the request from fetch-pack consists of SHA-1 hashes
> (selected from the initial list). Allowing the request to consist of
> names instead of SHA-1 hashes increases tolerance to refs changing
> (due to time, and due to having load-balanced servers without strong
> consistency), and is a step towards eliminating the need for the server
> to send the list of refs first (possibly improving performance).
>
> The protocol is extended by allowing fetch-pack to send
> "want-ref <name>", where <name> is a full name (refs/...) [1], possibly
> including glob characters. Upload-pack will inform the client of the
> refs actually matched by sending "wanted-ref <SHA-1> <name>" before
> sending the last ACK or NAK.

I have reviewed the patches and think they are a good idea,
cc'ing Jeff who you linked to and who had some ideas about the protocol as well.

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
  2017-01-25 22:02 Jonathan Tan
  2017-01-26 22:15 ` Stefan Beller
@ 2017-01-26 23:00 ` Jeff King
  2017-01-27  0:26   ` Jonathan Tan
  2017-02-07 23:53 ` Jonathan Tan
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2017-01-26 23:00 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Wed, Jan 25, 2017 at 02:02:53PM -0800, Jonathan Tan wrote:

> Negotiation currently happens by upload-pack initially sending a list of
> refs with names and SHA-1 hashes, and then several request/response
> pairs in which the request from fetch-pack consists of SHA-1 hashes
> (selected from the initial list). Allowing the request to consist of
> names instead of SHA-1 hashes increases tolerance to refs changing
> (due to time, and due to having load-balanced servers without strong
> consistency),

Interesting. My big question is: what happens when a ref _does_ change?
How does the client handle this?

The existing uploadpack.allowReachableSHA1InWant is there to work around
the problem that an http client may get a ref advertisement in one step,
and then come back later to do the want/have negotiation, at which point
the server has moved on (or maybe it's even a different server). There
the client says "I want sha1 X", and the server needs to say "well, X
isn't my tip now, but it's still acceptable for you to fetch".

But this seems to go in the opposite direction. After the advertisement,
the client decides "OK, I want to fetch refs/heads/master which is at
SHA1 X". It connects to the server and says "I want refs/heads/master".
Let's say the server has moved its version of the ref to SHA1 Y.

What happens? I think the server will say "wanted-ref master Y". Does
the client just decide to use "Y" then?  How does that interact with any
decisions the client might have made about X? I guess things like
fast-forwards have to be decided after we fetch the objects anyway
(since we cannot compute them until we get the pack), so maybe there
aren't any such decisions. I haven't checked.

> and is a step towards eliminating the need for the server
> to send the list of refs first (possibly improving performance).

I'm not sure it is all that useful towards that end. You still have to
break compatibility so that the client tells the server to suppress the
ref advertisement. After that, it is just a question of asking for the
refs. And you have two options:

  1. Ask the server to tell you the values of some subset of the refs,
     pick what you want, and then do the want/have as normal.

  2. Go straight to the want/have, but tell the server the refs you want
     instead of their sha1s.

I think your approach here would lead to (2).

But (1), besides being closer to how the protocol works now, seems like
it's more flexible. I can ask about the ref state without necessarily
having to retrieve the objects. How would you write git-ls-remote with
such a system?

> [1] There has been some discussion about whether the server should
> accept partial ref names, e.g. [2]. In this patch set, I have made the
> server only accept full names, and it is the responsibility of the
> client to send the multiple patterns which it wants to match. Quoting
> from the commit message of the second patch:
> 
>   For example, a client could reasonably expand an abbreviated
>   name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
>   refs/tags/foo", among others, and ensure that at least one such ref has
>   been fetched.

That has a cost that scales linearly with the number of refs, because
you have to ask for each name 6 times.  After the discussion you linked,
I think my preference is more like:

  1. Teach servers to accept a list of patterns from the client
     which will be resolved in order. Unlike your system, the client
     only needs to specify the list once per session, rather than once
     per ref.

  2. (Optional) Give a shorthand for the stock patterns that git has had
     in place for years. That saves some bytes over specifying the
     patterns completely (though it's really not _that_ many bytes, so
     perhaps the complication isn't a big deal).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
  2017-01-26 23:00 ` Jeff King
@ 2017-01-27  0:26   ` Jonathan Tan
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Tan @ 2017-01-27  0:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Thanks for your comments.

On 01/26/2017 03:00 PM, Jeff King wrote:
> On Wed, Jan 25, 2017 at 02:02:53PM -0800, Jonathan Tan wrote:
>
>> Negotiation currently happens by upload-pack initially sending a list of
>> refs with names and SHA-1 hashes, and then several request/response
>> pairs in which the request from fetch-pack consists of SHA-1 hashes
>> (selected from the initial list). Allowing the request to consist of
>> names instead of SHA-1 hashes increases tolerance to refs changing
>> (due to time, and due to having load-balanced servers without strong
>> consistency),
>
> Interesting. My big question is: what happens when a ref _does_ change?
> How does the client handle this?
>
> The existing uploadpack.allowReachableSHA1InWant is there to work around
> the problem that an http client may get a ref advertisement in one step,
> and then come back later to do the want/have negotiation, at which point
> the server has moved on (or maybe it's even a different server). There
> the client says "I want sha1 X", and the server needs to say "well, X
> isn't my tip now, but it's still acceptable for you to fetch".
>
> But this seems to go in the opposite direction. After the advertisement,
> the client decides "OK, I want to fetch refs/heads/master which is at
> SHA1 X". It connects to the server and says "I want refs/heads/master".
> Let's say the server has moved its version of the ref to SHA1 Y.
>
> What happens? I think the server will say "wanted-ref master Y". Does
> the client just decide to use "Y" then?  How does that interact with any
> decisions the client might have made about X? I guess things like
> fast-forwards have to be decided after we fetch the objects anyway
> (since we cannot compute them until we get the pack), so maybe there
> aren't any such decisions. I haven't checked.

Yes, the server will say "wanted-ref master Y". The relevant code 
regarding the decisions the client makes regarding X or Y is in do_fetch 
in builtin/fetch.c.

There, I can see these decisions done using X:
  - check_not_current_branch (forbidding fetching that modifies the
    current branch) (I just noticed that this has to be done for Y too,
    and will do so)
  - prune refs [*]
  - automatic tag following [*]

[*] X and Y may differ in that one relevant ref appears in one set but 
not in the other (because a ref was added or removed in the meantime), 
causing a different result if these decisions were to be done using Y, 
but I think that it is OK either way.

Fetch optimizations (for example, everything_local in fetch-pack.c) that 
check if the client really needs to fetch are also done using X, of 
course (and if the optimization succeeds, there is no Y).

Fast-forwards (and everything else in store_updated_refs) are decided 
using Y.

>> and is a step towards eliminating the need for the server
>> to send the list of refs first (possibly improving performance).
>
> I'm not sure it is all that useful towards that end. You still have to
> break compatibility so that the client tells the server to suppress the
> ref advertisement. After that, it is just a question of asking for the
> refs. And you have two options:
>
>   1. Ask the server to tell you the values of some subset of the refs,
>      pick what you want, and then do the want/have as normal.
>
>   2. Go straight to the want/have, but tell the server the refs you want
>      instead of their sha1s.
>
> I think your approach here would lead to (2).
>
> But (1), besides being closer to how the protocol works now, seems like
> it's more flexible. I can ask about the ref state without necessarily
> having to retrieve the objects. How would you write git-ls-remote with
> such a system?

Assuming a new protocol with the appropriate backwards compatibility 
(which would have to be done for both options), (2) does save a 
request/response pair (because we can send the ref names directly as 
"want-ref" in the initial request instead of sending ref names in the 
initial request and then confirming them using "want <SHA-1>" in the 
subsequent request). Also, (2) is more tolerant towards refs changing 
over time. (1) might be closer to the current protocol, but I think that 
the difference is not so significant (only in "want-ref" vs "want" 
request and the "wanted-ref" response).

As for git-ls-remote, I currently think that it would have to use the 
existing protocol.

>> [1] There has been some discussion about whether the server should
>> accept partial ref names, e.g. [2]. In this patch set, I have made the
>> server only accept full names, and it is the responsibility of the
>> client to send the multiple patterns which it wants to match. Quoting
>> from the commit message of the second patch:
>>
>>   For example, a client could reasonably expand an abbreviated
>>   name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
>>   refs/tags/foo", among others, and ensure that at least one such ref has
>>   been fetched.
>
> That has a cost that scales linearly with the number of refs, because
> you have to ask for each name 6 times.  After the discussion you linked,
> I think my preference is more like:
>
>   1. Teach servers to accept a list of patterns from the client
>      which will be resolved in order. Unlike your system, the client
>      only needs to specify the list once per session, rather than once
>      per ref.
>
>   2. (Optional) Give a shorthand for the stock patterns that git has had
>      in place for years. That saves some bytes over specifying the
>      patterns completely (though it's really not _that_ many bytes, so
>      perhaps the complication isn't a big deal).

I envision that most fetches would be done on a single name (with or 
without globs), that expands to 5 (so it is not so crucial to factor out 
the list of patterns). Having said that, I am open to changing this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
  2017-01-25 22:02 Jonathan Tan
  2017-01-26 22:15 ` Stefan Beller
  2017-01-26 23:00 ` Jeff King
@ 2017-02-07 23:53 ` Jonathan Tan
  2017-02-09  0:26   ` Junio C Hamano
  2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Tan @ 2017-02-07 23:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster, peff

Looking back at the comments I have received in reply, I think that
there were two major concerns: (i) the case where a server ACKs a client
"have" line and the client forever thinks that the server has it, but it
may not be the case for future servers (or future invocations of the
same server), and (ii) what the client does with 2 "versions" of remote
refs.

For (i), the issue already exists and as far as I can tell, this patch
set does not directly impact it, positively or negatively. The
"have"/"ACK" part of negotiation is kept the same - the only difference
in this patch set is that wants can be specified by name instead of by
SHA-1 hash. This patch set does not help the "have"/"ACK" part of
negotiation, but it helps the "want" part.

For (ii), I have prepared a patch to be squashed, and extended the
commit message with an explanation of what is happening. (The commit
message and the patch are appended to this e-mail).

(There was also some discussion of the client being required to send
exact matches in its "want-ref" lines.)

Please let me know if you have any other opinions or thoughts. It does
seem to me like such a protocol update (or something similar) would help
for large repositories with many ever-changing refs (like refs/changes
in Gerrit or refs/pull in GitHub) - and the creation of a ref would look
like a deletion depending on the order in which we access the servers in
a load-balancing arrangement and the order in which those servers
synchronize themselves. And deletion of refs does not work with the
current protocol, but would work with a protocol that supports wildcards
(like this one).

-- 8< --

fetch: send want-ref and receive fetched refs

Teach fetch to send refspecs to the underlying transport, and teach all
components used by the HTTP transport (remote-curl, transport-helper,
fetch-pack) to understand and propagate the names and SHA-1s of the refs
fetched.

The do_fetch method in builtin/fetch.c originally had only one
remote-local ref map, generated from the already-fetched list of remote
refs. This patch introduces a new ref map generated from the list of
fetched refs. Usages of the list of remote refs or the remote-local ref
map are updated as follows:
 - check_not_current_branch (which checks that the current branch is not
   affected by the fetch) is performed both on the original ref map (to
   die before the fetch if we can, as an optimization) and on the new
   ref map (since the new ref map is the one actually applied).
 - Pruning is done based on the new ref map.
 - backfill_tags (for tag following) is performed using the original
   list of remote refs because the new list of fetched refs is not
   guaranteed to contain tag information. (Since backfill_tags performs
   another fetch, it does not need to be fully consistent with the
   just-returned packfile.)
The list of remote refs and the remote-local ref map are not otherwise
used by do_fetch or any function in its invocation chain (fetch_one and
cmd_fetch).
---
 builtin/fetch.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 87de00e49..b8432394c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1177,6 +1177,20 @@ static int do_fetch(struct transport *transport,
 
 	if (tags == TAGS_DEFAULT && autotags)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
+	if (fetch_refs(transport, e_rs, e_rs_nr, ref_map, &new_remote_refs)) {
+		free_refs(ref_map);
+		retcode = 1;
+		goto cleanup;
+	}
+	if (new_remote_refs) {
+		free_refs(ref_map);
+		ref_map = get_ref_map(transport->remote, new_remote_refs,
+				      refs, ref_count, tags, autotags);
+		if (!update_head_ok)
+			check_not_current_branch(ref_map);
+		free_refs(new_remote_refs);
+	}
+
 	if (prune) {
 		/*
 		 * We only prune based on refspecs specified
@@ -1192,18 +1206,6 @@ static int do_fetch(struct transport *transport,
 				   transport->url);
 		}
 	}
-	if (fetch_refs(transport, e_rs, e_rs_nr, ref_map, &new_remote_refs)) {
-		free_refs(ref_map);
-		retcode = 1;
-		goto cleanup;
-	}
-	if (new_remote_refs) {
-		free_refs(ref_map);
-		ref_map = get_ref_map(transport->remote, new_remote_refs,
-				      refs, ref_count, tags, autotags);
-		free_refs(new_remote_refs);
-	}
-
 	if (consume_refs(transport, ref_map)) {
 		free_refs(ref_map);
 		retcode = 1;
-- 
2.11.0.483.g087da7b7c-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
  2017-02-07 23:53 ` Jonathan Tan
@ 2017-02-09  0:26   ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-02-09  0:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

> Usages of the list of remote refs or the remote-local ref
> map are updated as follows:
>  - check_not_current_branch (which checks that the current branch is not
>    affected by the fetch) is performed both on the original ref map (to
>    die before the fetch if we can, as an optimization) and on the new
>    ref map (since the new ref map is the one actually applied).

OK.

>  - Pruning is done based on the new ref map.

OK.  As that is what eventually gets "installed" on the local side,
it makes sense to become consistent with that set, not the set the
original server gave you.

>  - backfill_tags (for tag following) is performed using the original
>    list of remote refs because the new list of fetched refs is not
>    guaranteed to contain tag information. (Since backfill_tags performs
>    another fetch, it does not need to be fully consistent with the
>    just-returned packfile.)

This smells correct but I need to think about this one a bit more.

Overall I think the strategy is agreeable.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
@ 2018-05-20  8:24 Apinan Ponchan
  0 siblings, 0 replies; 7+ messages in thread
From: Apinan Ponchan @ 2018-05-20  8:24 UTC (permalink / raw)
  To: peff; +Cc: git, jonathantanmy



ส่งจาก iPhone ของฉัน

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-20  9:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-20  8:24 [RFC 00/14] Allow fetch-pack to send ref names (globs allowed) Apinan Ponchan
  -- strict thread matches above, loose matches on Subject: below --
2017-01-25 22:02 Jonathan Tan
2017-01-26 22:15 ` Stefan Beller
2017-01-26 23:00 ` Jeff King
2017-01-27  0:26   ` Jonathan Tan
2017-02-07 23:53 ` Jonathan Tan
2017-02-09  0:26   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).