[RFC/PATCH 0/3] protocol v2

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [RFC/PATCH 0/3] protocol v2
@ 2015-02-24  3:12 Stefan Beller
  2015-02-24  3:12 ` [PATCH 1/3] Document protocol capabilities extension Stefan Beller
                   ` (3 more replies)
  0 siblings, 4 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-24  3:12 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Inspired by a discusson on the scaling of git in the last days,
I thought about starting the adventure to teach git a new transport
protocol.

One of the biggest problems of a new protocol would be deployment
as the users probably would not care too deeply. It should just 
work in the sense that the user should not even sense that the
protocol changed. To do so we need to make sure the protocol
is backwards compatible and works if and old client talks to 
a new server as well as the other way round.

A later incarnation of the patch series will eventually add the 
possibility to add new versions of the transport protocols easily
without harming the user. For now in the first revision of the 
series it just documents an approach of how I'd start this problem 
of compatibility issues.

I realize this will be a bigger change to git, so I'd rather
just make a small step now. The actual discussion on how to
do the next protocol(s) may be started on the gitmerge
conference? (bloomfilter! client speaks first!, rsyncing
the refs changes!)

Any thoughts on how to make it easy to teach git new protocols
are very welcome.

Thanks,
Stefan

Stefan Beller (3):
  Document protocol capabilities extension
  receive-pack: add advertisement of different protocol options
  receive-pack: enable protocol v2

 Documentation/technical/protocol-capabilities.txt | 11 +++++++++++
 builtin/receive-pack.c                            |  7 +++++++
 2 files changed, 18 insertions(+)

-- 
2.3.0.81.gc37f363

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 1/3] Document protocol capabilities extension
  2015-02-24  3:12 [RFC/PATCH 0/3] protocol v2 Stefan Beller
@ 2015-02-24  3:12 ` Stefan Beller
  2015-02-24  3:12 ` [PATCH 2/3] receive-pack: add advertisement of different protocol options Stefan Beller
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-24  3:12 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

In the future we may want to have multiple different versions for
the transport protocol each optimized for a certain aspect of the
transport such as latency, bandwidth, CPU load etc. To make this
future proof document a possible way how to advertise the versions
such that the client can store it and use try to use such advertised
protocols later if the clients supports them.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/technical/protocol-capabilities.txt | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 4f8a7bf..6829668 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -268,3 +268,14 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+versions
+--------
+
+If the server supports more than the standard protocol, this capability
+advertises the additional versions. It is a comma separated list of
+the names of extensions for the binaries (i.e. "v2" would result in
+the assumption that git-receive-pack-v2 as well as git-upload-pack-v2
+is available). The items are desired in desceding order by the server,
+i.e. the server would like the client to use the first advertised
+version most.
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 2/3] receive-pack: add advertisement of different protocol options
  2015-02-24  3:12 [RFC/PATCH 0/3] protocol v2 Stefan Beller
  2015-02-24  3:12 ` [PATCH 1/3] Document protocol capabilities extension Stefan Beller
@ 2015-02-24  3:12 ` Stefan Beller
  2015-02-24  3:12 ` [PATCH 3/3] receive-pack: enable protocol v2 Stefan Beller
  2015-02-24  4:02 ` [RFC/PATCH 0/3] " Duy Nguyen
  3 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-24  3:12 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/receive-pack.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e0ce78e..a077b1d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -71,6 +71,8 @@ static long nonce_stamp_slop;
 static unsigned long nonce_stamp_slop_limit;
 static struct ref_transaction *transaction;
 
+static const char *advertised_versions = NULL;
+
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
 	if (value) {
@@ -168,6 +170,9 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.advertisedversions") == 0)
+		return git_config_string(&advertised_versions, var, value);
+
 	return git_default_config(var, value, cb);
 }
 
@@ -189,6 +194,8 @@ static void show_ref(const char *path, const unsigned char *sha1)
 			strbuf_addstr(&cap, " ofs-delta");
 		if (push_cert_nonce)
 			strbuf_addf(&cap, " push-cert=%s", push_cert_nonce);
+		if (advertise_versions)
+			strbuf_addstr(&cap, " versions=%s", advertise_versions);
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
 		packet_write(1, "%s %s%c%s\n",
 			     sha1_to_hex(sha1), path, 0, cap.buf);
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 3/3] receive-pack: enable protocol v2
  2015-02-24  3:12 [RFC/PATCH 0/3] protocol v2 Stefan Beller
  2015-02-24  3:12 ` [PATCH 1/3] Document protocol capabilities extension Stefan Beller
  2015-02-24  3:12 ` [PATCH 2/3] receive-pack: add advertisement of different protocol options Stefan Beller
@ 2015-02-24  3:12 ` Stefan Beller
  2015-02-24  4:02 ` [RFC/PATCH 0/3] " Duy Nguyen
  3 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-24  3:12 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This string would tell the client that git-receive-packv2 as well as
git-receive-pack would be supported.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/receive-pack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index a077b1d..dcb231b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -71,7 +71,7 @@ static long nonce_stamp_slop;
 static unsigned long nonce_stamp_slop_limit;
 static struct ref_transaction *transaction;
 
-static const char *advertised_versions = NULL;
+static const char *advertised_versions =  "v2,";
 
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-24  3:12 [RFC/PATCH 0/3] protocol v2 Stefan Beller
                   ` (2 preceding siblings ...)
  2015-02-24  3:12 ` [PATCH 3/3] receive-pack: enable protocol v2 Stefan Beller
@ 2015-02-24  4:02 ` Duy Nguyen
  2015-02-24  5:40   ` Stefan Beller
  2015-02-24  6:15   ` Junio C Hamano
  3 siblings, 2 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-02-24  4:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List

On Tue, Feb 24, 2015 at 10:12 AM, Stefan Beller <sbeller@google.com> wrote:
> One of the biggest problems of a new protocol would be deployment
> as the users probably would not care too deeply. It should just
> work in the sense that the user should not even sense that the
> protocol changed.

Agreed.

> To do so we need to make sure the protocol
> is backwards compatible and works if an old client talks to
> a new server as well as the other way round.

It's very hard to keep backward compatibility if you want to stop the
initial ref adverstisement, costly when there are lots of refs. But we
can let both protocols run in parallel, with the old one advertise the
presence of the new one. Then the client could switch to new protocol
gradually. This way new protocol could forget about backward
compatibility. See

http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-24  4:02 ` [RFC/PATCH 0/3] " Duy Nguyen
@ 2015-02-24  5:40   ` Stefan Beller
  2015-02-24  6:15   ` Junio C Hamano
  1 sibling, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-24  5:40 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, Feb 24, 2015 at 10:12 AM, Stefan Beller <sbeller@google.com> wrote:
>> One of the biggest problems of a new protocol would be deployment
>> as the users probably would not care too deeply. It should just
>> work in the sense that the user should not even sense that the
>> protocol changed.
>
> Agreed.
>
>> To do so we need to make sure the protocol
>> is backwards compatible and works if an old client talks to
>> a new server as well as the other way round.
>
> It's very hard to keep backward compatibility if you want to stop the
> initial ref adverstisement, costly when there are lots of refs. But we
> can let both protocols run in parallel, with the old one advertise the
> presence of the new one.

That's what I actually meant, to have different versions out there,
but maybe having the version as of now as the least common denominator
such that it always works (albeit inefficient for many refs).

> Then the client could switch to new protocol
> gradually. This way new protocol could forget about backward
> compatibility. See
>
> http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325
> --
> Duy

> I would add that upload-pack also advertises about the availability of
> upload-pack2 and the client may set the remote.*.useUploadPack2 to
> either yes or auto so next time upload-pack2 will be used.

I had a similar thought, though I would not just restrict it to v2
this time, but
I'd aim to make it possible to plug whatever protocol you want to.
(Comparable to the SSL or ssh, it will always work, but as a proficient user
you can spend lot's of time tweaking what you actually want, looking
at tradeoffs
of efficiency, security, convenience).

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-24  4:02 ` [RFC/PATCH 0/3] " Duy Nguyen
  2015-02-24  5:40   ` Stefan Beller
@ 2015-02-24  6:15   ` Junio C Hamano
  2015-02-24 23:37     ` Stefan Beller
  2015-03-01  8:41     ` Junio C Hamano
  1 sibling, 2 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-02-24  6:15 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>
> It's very hard to keep backward compatibility if you want to stop the
> initial ref adverstisement, costly when there are lots of refs. But we
> can let both protocols run in parallel, with the old one advertise the
> presence of the new one. Then the client could switch to new protocol
> gradually. This way new protocol could forget about backward
> compatibility. See
>
> http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325

Yes, the whole thread is worth a read, but the approach suggested by
that article $gmane/244325 is very good for its simplicity. The server
end programs, upload-pack and receive-pack, need to only learn to
advertise the availability of upload-pack-v2 and receive-pack-v2
services and the client side programs, fetch-pack and push-pack,
need to only notice the advertisement and record the availability of
v2 counterparts for the current remote *and* continue the exchange
in v1 protocol. That way, there is very little risk for breaking anything.

And the programs for new protocol exchange do not have to worry
about having to talk with older counterparts and downgrading the
protocol inline at all. As long as we learn from our past mistakes
and make sure that the very initial exchange will be kept short (one
of the items in the list of limitations, $gmane/264000), future servers
and clients can upgrade the protocol they talk inline by probing
capabilities, just like the current protocol allows them to choose
extensions. The biggest issue in the current protocol is not "who
speaks first" (that is merely one aspect) but "what is spoken first",
iow, "one side blinly gives a large message as the first thing", which
cannot be squelched by capability exchange.

So if we are going to discuss a new protocol, I'd prefer to see the
discussion without worrying too much about how to inter-operate
with the current vintage of Git. It is no longer an interesting problem,
as we know how to solve it with minimum risk. Instead, I'd like to
see us design the new protocol in such a way that it is in-line
upgradable without repeating our past mistakes.

I am *not* convinced that we want multiple suite of protocols that
must be chosen from to suit the use pattern, as mentioned somewhere
upthread, by the way.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-24  6:15   ` Junio C Hamano
@ 2015-02-24 23:37     ` Stefan Beller
  2015-02-25 12:44       ` Duy Nguyen
  2015-03-01  8:41     ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-24 23:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Mon, Feb 23, 2015 at 10:15 PM, Junio C Hamano <gitster@pobox.com> wrote:
> On Mon, Feb 23, 2015 at 8:02 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>>
>> It's very hard to keep backward compatibility if you want to stop the
>> initial ref adverstisement, costly when there are lots of refs. But we
>> can let both protocols run in parallel, with the old one advertise the
>> presence of the new one. Then the client could switch to new protocol
>> gradually. This way new protocol could forget about backward
>> compatibility. See
>>
>> http://thread.gmane.org/gmane.comp.version-control.git/215054/focus=244325
>
> Yes, the whole thread is worth a read, but the approach suggested by
> that article $gmane/244325 is very good for its simplicity. The server
> end programs, upload-pack and receive-pack, need to only learn to
> advertise the availability of upload-pack-v2 and receive-pack-v2
> services and the client side programs, fetch-pack and push-pack,
> need to only notice the advertisement and record the availability of
> v2 counterparts for the current remote *and* continue the exchange
> in v1 protocol. That way, there is very little risk for breaking anything.

Right, I want to add this "learn about v2 on the fly, continue as always"
to the protocol.

>
> So if we are going to discuss a new protocol, I'd prefer to see the
> discussion without worrying too much about how to inter-operate
> with the current vintage of Git. It is no longer an interesting problem,
> as we know how to solve it with minimum risk. Instead, I'd like to
> see us design the new protocol in such a way that it is in-line
> upgradable without repeating our past mistakes.
>
> I am *not* convinced that we want multiple suite of protocols that
> must be chosen from to suit the use pattern, as mentioned somewhere
> upthread, by the way.

I do think it makes sense to have different protocols or different tunings
of one protocol, because there are many different situations in which different
metrics are the key metric.

If you are on mobile, you'd possibly be billed by the bytes on the wire, so
you want to have a protocol with as actual transport as possible and would
maybe trade off transported bytes to lots of computational overhead.

If you are in Australia (sorry downunder ;) or on satellite internet,
you may care a lot about latency and roundtrip times.

If you are in a corporate environment and just cloning from next door,
you may want to have the overall process (compute+network+
local reconstruction) just be fast overall.

I can understand, that we maybe want to just provide one generic
"version 2" of the protocol which is an allrounder not doing bad in
all of these aspects, but I can see usecases of having the desire to
replace the wire protocol by your own implementation. To do so
we could try to offer an API which makes implementing a new
protocol somewhat easy. The current state of affairs is not providing
this flexibility.

I think it would be not much overhead to have such
flexibility when writing the actual code for the "very little risk" v2
update. So instead of advertising a boolean flag meaning
"This server/client speaks version2", we would rather send a list
"This server speaks v2,v1 and v-custom-optimized-for-high-latency".

I started looking for academic literature if there are generic solutions
to finding graph differences, but no real luck for adapting to our
problem yet.

Thanks for your input,
Stefan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-24 23:37     ` Stefan Beller
@ 2015-02-25 12:44       ` Duy Nguyen
  2015-02-25 18:04         ` Junio C Hamano
  0 siblings, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-02-25 12:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List

On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller <sbeller@google.com> wrote:
> I can understand, that we maybe want to just provide one generic
> "version 2" of the protocol which is an allrounder not doing bad in
> all of these aspects, but I can see usecases of having the desire to
> replace the wire protocol by your own implementation. To do so
> we could try to offer an API which makes implementing a new
> protocol somewhat easy. The current state of affairs is not providing
> this flexibility.

I think we are quite flexible after initial ref advertisement. After
that point the client tells the server its capabilities and the server
does the same for the client. Only shared features can be used. So if
you want to add a new micro protocol for mobile, just add "mobile"
capability to both client and server. A new implementation can support
no capabililities and it should work fine with C Git (less efficient
though, of course). And we have freedom to mix capabilities any way we
want (it's harder to do when you have to follow v2, v2.1, v2.2...)
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-25 12:44       ` Duy Nguyen
@ 2015-02-25 18:04         ` Junio C Hamano
  2015-02-26  7:31           ` Stefan Beller
  0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2015-02-25 18:04 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller <sbeller@google.com> wrote:
>> I can understand, that we maybe want to just provide one generic
>> "version 2" of the protocol which is an allrounder not doing bad in
>> all of these aspects, but I can see usecases of having the desire to
>> replace the wire protocol by your own implementation. To do so
>> we could try to offer an API which makes implementing a new
>> protocol somewhat easy. The current state of affairs is not providing
>> this flexibility.
>
> I think we are quite flexible after initial ref advertisement.

Yes, that is exactly where my "I am not convinced" comes from.

> After
> that point the client tells the server its capabilities and the server
> does the same for the client. Only shared features can be used. So if
> you want to add a new micro protocol for mobile, just add "mobile"
> capability to both client and server. A new implementation can support
> no capabililities and it should work fine with C Git (less efficient
> though, of course). And we have freedom to mix capabilities any way we
> want (it's harder to do when you have to follow v2, v2.1, v2.2...)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-25 18:04         ` Junio C Hamano
@ 2015-02-26  7:31           ` Stefan Beller
  2015-02-26 10:15             ` Duy Nguyen
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-26  7:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller <sbeller@google.com> wrote:
>>> I can understand, that we maybe want to just provide one generic
>>> "version 2" of the protocol which is an allrounder not doing bad in
>>> all of these aspects, but I can see usecases of having the desire to
>>> replace the wire protocol by your own implementation. To do so
>>> we could try to offer an API which makes implementing a new
>>> protocol somewhat easy. The current state of affairs is not providing
>>> this flexibility.
>>
>> I think we are quite flexible after initial ref advertisement.
>
> Yes, that is exactly where my "I am not convinced" comes from.
>

We are not. (not really at least). We can tune some parameters or
change the behavior slightly,
but we cannot fix core assumptions made when creating v2 protocol.
This you can see when when talking about v1 as well: we cannot fix any
wrongdoings of v1 now by adding another capability. So from my point
of view we don't waste resources when having an advertisement of
possible protocols instead of a boolean flag indicating v2 is
supported. There is really not much overhead in coding nor bytes
exchanged on the wire, so why not accept stuff that comes for free
(nearly) ?

I mean how do we know all the core assumptions made for v2 hold in the
future? We don't. That's why I'd propose a plain and easy exchange at
first stating the version to talk.

Anyway what is the cost of a round trip time compared to the bytes on
the wire? Usually the cost of bytes on the wire correlate with the
latency anyway. (think mobile metered compared to corporate setting
with low latency). That's why I'd rather optimize for used bandwidth
than round trip times, but that may be just my personal perception of
the internet. That's why I'd propose different protocols.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-26  7:31           ` Stefan Beller
@ 2015-02-26 10:15             ` Duy Nguyen
  2015-02-26 20:08               ` Stefan Beller
  2015-02-26 20:13               ` Junio C Hamano
  0 siblings, 2 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-02-26 10:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List

On Thu, Feb 26, 2015 at 2:31 PM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>>> On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller <sbeller@google.com> wrote:
>>>> I can understand, that we maybe want to just provide one generic
>>>> "version 2" of the protocol which is an allrounder not doing bad in
>>>> all of these aspects, but I can see usecases of having the desire to
>>>> replace the wire protocol by your own implementation. To do so
>>>> we could try to offer an API which makes implementing a new
>>>> protocol somewhat easy. The current state of affairs is not providing
>>>> this flexibility.
>>>
>>> I think we are quite flexible after initial ref advertisement.
>>
>> Yes, that is exactly where my "I am not convinced" comes from.
>>
>
> We are not. (not really at least). We can tune some parameters or
> change the behavior slightly,
> but we cannot fix core assumptions made when creating v2 protocol.
> This you can see when when talking about v1 as well: we cannot fix any
> wrongdoings of v1 now by adding another capability.

Step 1 then should be identifying these wrongdoings and assumptions.

We can really go wild with these capabilities. The only thing that
can't be changed is perhaps sending the first ref. I don't know
whether we can accept a dummy first ref... After that point, you can
turn the protocol upside down because both client and server know what
it would be.

> So from my point
> of view we don't waste resources when having an advertisement of
> possible protocols instead of a boolean flag indicating v2 is
> supported. There is really not much overhead in coding nor bytes
> exchanged on the wire, so why not accept stuff that comes for free
> (nearly) ?

You realize you're advertising v2 as a new capability, right? Instead
of defining v2 feature set then advertise v2, people could simply add
new features directly. I don't see v2 (at least with these patches)
adds any value.

> I mean how do we know all the core assumptions made for v2 hold in the
> future? We don't. That's why I'd propose a plain and easy exchange at
> first stating the version to talk.

And we already does that, except that we don't state what version (as
a number) exactly, but what feature that version supports. The focus
should be the new protocol at daemon.c and maybe remote-curl.c where
we do know the current protocol is not flexible enough.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-26 10:15             ` Duy Nguyen
@ 2015-02-26 20:08               ` Stefan Beller
       [not found]                 ` <CACsJy8DOS_999ZgW7TqsH-dkrUFpjZf0TFQeFUt9s0bNhHY0Bw@mail.gmail.com>
  2015-02-26 20:13               ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-26 20:08 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List

On Thu, Feb 26, 2015 at 2:15 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Feb 26, 2015 at 2:31 PM, Stefan Beller <sbeller@google.com> wrote:
>> On Wed, Feb 25, 2015 at 10:04 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Duy Nguyen <pclouds@gmail.com> writes:
>>>
>>>> On Wed, Feb 25, 2015 at 6:37 AM, Stefan Beller <sbeller@google.com> wrote:
>>>>> I can understand, that we maybe want to just provide one generic
>>>>> "version 2" of the protocol which is an allrounder not doing bad in
>>>>> all of these aspects, but I can see usecases of having the desire to
>>>>> replace the wire protocol by your own implementation. To do so
>>>>> we could try to offer an API which makes implementing a new
>>>>> protocol somewhat easy. The current state of affairs is not providing
>>>>> this flexibility.
>>>>
>>>> I think we are quite flexible after initial ref advertisement.
>>>
>>> Yes, that is exactly where my "I am not convinced" comes from.
>>>
>>
>> We are not. (not really at least). We can tune some parameters or
>> change the behavior slightly,
>> but we cannot fix core assumptions made when creating v2 protocol.
>> This you can see when when talking about v1 as well: we cannot fix any
>> wrongdoings of v1 now by adding another capability.
>
> Step 1 then should be identifying these wrongdoings and assumptions.

So I think one of the key assumption was to not have many refs to advertise,
and advertising the refs is fine under that assumption.
So from my point of view it is hard to change the general

>
> We can really go wild with these capabilities. The only thing that
> can't be changed is perhaps sending the first ref. I don't know
> whether we can accept a dummy first ref... After that point, you can
> turn the protocol upside down because both client and server know what
> it would be.

So the way I currently envision (the transition to and) version 2 of
the protocol:

First connection (using the protocol as of now):

Server: Here are all the refs and capabilities I can offer. The capabilities
    include "not-send-refs-first" (aka version2)
Client: Ok, I'll store not-send-refs-first for next time. Now we will
continue with
    these options: <...>. For now we continue using the current
protocol and I want
    to update the "master" branch.
Server: Ok here is a pack file, and then master advances $SHA1..$SHA1
Client: ok, thanks, bye

For the next connection I have different ideas:

Client thinks v2 is supported, so it talks first: Last time we talked
your capabilities
hashed to "$SHA1", is that still correct?
Server: yes it is
# In the above roundtrip we would have a new key assumption that the
capabilities
# don't change often. Having push-certs enabled, this is invalid of
today. Hover this
# could be implemented with very low bandwidth usage
# The alternative path would be:
# Server: No my new capabilities are: <....>
Client: Ok I want to update all of refs/heads/{master,next,pu}. My
last fetch was
    yesterday at noon.
Server: Let me check the ref logs for these refs. Here is a packfile of length
    1000 bytes: <binary gibberish>
    {master, next} did not update since yesterday noon, pu updates from A..B
Client: ok, thanks, bye


Another approach would be this:
Client thinks v2 is supported, so it talks first: Last time we talked
you sent me
    refs advertisement including capabilities which hash to $SHA1.
Server: I see, I have stored that. Now that time has advanced there are a few
    differences, here is a diff of the refs advertisement:
* b3a551adf53c224b04c40f05b72a8790807b3138 HEAD\0 <capabilities>
* b3a551adf53c224b04c40f05b72a8790807b3138 refs/heads/master
- 24ca137a384aa1ac5a776eddaf35bb820fc6f6e6 refs/heads/tmp-fix
+ 1da8335ad5d0e46062a929ba6481bbbe35c8eef0 refs/pull/123/head

    Note that I do not include changed lines as +one line and - one
line as you know
    what the line was by your given $SHA1, so changed lines are marked
with *, while
    lines starting with '-' indicate deleted refs and '+' indicate new refs.
Client: I see, I can reconstruct the refs advertisement. Now we can
continue talking
as we always talked using v1 protocol.

>
>> So from my point
>> of view we don't waste resources when having an advertisement of
>> possible protocols instead of a boolean flag indicating v2 is
>> supported. There is really not much overhead in coding nor bytes
>> exchanged on the wire, so why not accept stuff that comes for free
>> (nearly) ?
>
> You realize you're advertising v2 as a new capability, right? Instead
> of defining v2 feature set then advertise v2, people could simply add
> new features directly. I don't see v2 (at least with these patches)
> adds any value.

Yes, we can go wild after the refs advertisement, but that is not the
critical problem as it works ok-ish? The problem I see for now is the huge
refs advertisement even before the capabilities are exchanged. So maybe
I do not want to talk about v2 but about changing the current protocol to first
talk about the capabilities in the first round trip, not sure if we
ever want to attach
data into the first RT as it may explode as soon as that information grows in
the future. The disadvantage would be one added round trip time, though I'd
guess that's ok for most compared to the huge refs advertisement.

>
>> I mean how do we know all the core assumptions made for v2 hold in the
>> future? We don't. That's why I'd propose a plain and easy exchange at
>> first stating the version to talk.
>
> And we already does that, except that we don't state what version (as
> a number) exactly, but what feature that version supports. The focus
> should be the new protocol at daemon.c and maybe remote-curl.c where
> we do know the current protocol is not flexible enough.

Most features are pretty non-invasive and at a few points in the code, you
can add
     if (featureX)
        doFeatureXRelatedStuff();

For changing the initial talking we'd need to make a more drastic change
to the code, which is why I thought to call it v2.

Thanks for your input,
Stefan

> --
> Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-26 10:15             ` Duy Nguyen
  2015-02-26 20:08               ` Stefan Beller
@ 2015-02-26 20:13               ` Junio C Hamano
  2015-02-27  1:26                 ` Stefan Beller
  2015-02-27 23:05                 ` Junio C Hamano
  1 sibling, 2 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-02-26 20:13 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> Step 1 then should be identifying these wrongdoings and assumptions.
>
> We can really go wild with these capabilities. The only thing that
> can't be changed is perhaps sending the first ref. I don't know
> whether we can accept a dummy first ref... After that point, you can
> turn the protocol upside down because both client and server know what
> it would be.

Yes, exactly.  To up/down/side-grade from v1 is technically
possible, but being technically possible is different from being
sensible.  The capability-based sidegrade does not solve the problem
when the problem to be solved is that the server side needs to spend
a lot of cycles and the network needs to carry megabytes of data
before capability exchange happens.  Yes, the newer server and the
newer client can notice that the counterparty is new and start
talking in new protocol (which may or may not benefit from already
knowing the result of ref advertisement), but by the time that
happens, the resource has already been spent and wasted.

I do not think v1 can be fixed by "send one ref with capability,
newer client may respond immediately so we can stop enumerating
remaining refs and older one will get stuck so we can have a timeout
to see if the connection is from the newer one, and send the rest
for the older client", because anything that involves such a timeout
would not reliably work over WAN.

> You realize you're advertising v2 as a new capability, right? Instead
> of defining v2 feature set then advertise v2, people could simply add
> new features directly. I don't see v2 (at least with these patches)
> adds any value.

I agree with the value assessment of these patches 98%, but these
bits can be taken as the "we have v2 server availble for you on the
side, by the way" hint you mentioned in the older thread, I think.

> And we already does that, except that we don't state what version (as
> a number) exactly, but what feature that version supports. The focus
> should be the new protocol at daemon.c and maybe remote-curl.c where
> we do know the current protocol is not flexible enough.

The "first" thing the client tells the server is what service it
requests.  A request over git:// protocol is read by "git daemon" to
choose which service to run, and it is read directly by the login
shell if it comes over ssh:// protocol.

There is nothing that prevents us from defining that service to be a
generic "git service", not "upload-pack", "archive", "receive-pack".
And the early protocol exchange, once "git service" is spawned, with
the client can be "what real services does the server end support?"
capability list responded with "wow, you are new enough to support
the 'trickle-pack' service---please connect me to it" request.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-26 20:13               ` Junio C Hamano
@ 2015-02-27  1:26                 ` Stefan Beller
  2015-02-27  2:15                   ` Stefan Beller
  2015-02-27 23:05                 ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-27  1:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Thu, Feb 26, 2015 at 12:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> Step 1 then should be identifying these wrongdoings and assumptions.
>>
>> We can really go wild with these capabilities. The only thing that
>> can't be changed is perhaps sending the first ref. I don't know
>> whether we can accept a dummy first ref... After that point, you can
>> turn the protocol upside down because both client and server know what
>> it would be.
>
> Yes, exactly.  To up/down/side-grade from v1 is technically
> possible, but being technically possible is different from being
> sensible.  The capability-based sidegrade does not solve the problem
> when the problem to be solved is that the server side needs to spend
> a lot of cycles and the network needs to carry megabytes of data
> before capability exchange happens.  Yes, the newer server and the
> newer client can notice that the counterparty is new and start
> talking in new protocol (which may or may not benefit from already
> knowing the result of ref advertisement), but by the time that
> happens, the resource has already been spent and wasted.
>
> I do not think v1 can be fixed by "send one ref with capability,
> newer client may respond immediately so we can stop enumerating
> remaining refs and older one will get stuck so we can have a timeout
> to see if the connection is from the newer one, and send the rest
> for the older client", because anything that involves such a timeout
> would not reliably work over WAN.
>
>> You realize you're advertising v2 as a new capability, right? Instead
>> of defining v2 feature set then advertise v2, people could simply add
>> new features directly. I don't see v2 (at least with these patches)
>> adds any value.
>
> I agree with the value assessment of these patches 98%, but these
> bits can be taken as the "we have v2 server availble for you on the
> side, by the way" hint you mentioned in the older thread, I think.
>
>> And we already does that, except that we don't state what version (as
>> a number) exactly, but what feature that version supports. The focus
>> should be the new protocol at daemon.c and maybe remote-curl.c where
>> we do know the current protocol is not flexible enough.
>
> The "first" thing the client tells the server is what service it
> requests.  A request over git:// protocol is read by "git daemon" to
> choose which service to run, and it is read directly by the login
> shell if it comes over ssh:// protocol.
>
> There is nothing that prevents us from defining that service to be a
> generic "git service", not "upload-pack", "archive", "receive-pack".
> And the early protocol exchange, once "git service" is spawned, with
> the client can be "what real services does the server end support?"
> capability list responded with "wow, you are new enough to support
> the 'trickle-pack' service---please connect me to it" request.
>

So I am not quite sure how to understand this input.

I wonder if a high level test could look like the following,
which just tests the workflow with git fetch, but not the
internals.

(Note: patch formatting may be broken as it's send via gmail web ui)
---8<---
From: Stefan Beller <sbeller@google.com>
Date: Thu, 26 Feb 2015 17:19:30 -0800
Subject: [PATCH] Propose new tests for transitioning to the new option
transport.capabilitiesfirst

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t5544-capability-handshake.sh | 81 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)
 create mode 100755 t/t5544-capability-handshake.sh

diff --git a/t/t5544-capability-handshake.sh b/t/t5544-capability-handshake.sh
new file mode 100755
index 0000000..aa2b52d
--- /dev/null
+++ b/t/t5544-capability-handshake.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+test_description='fetching from a repository using the "capabilities
first" push option'
+
+. ./test-lib.sh
+
+mk_repo_pair () {
+ rm -rf workbench upstream &&
+ test_create_repo upstream &&
+ test_create_repo workbench &&
+ (
+ cd upstream &&
+ git config receive.denyCurrentBranch warn
+ ) &&
+ (
+ cd workbench &&
+ git remote add origin ../upstream
+ )
+}
+
+generate_commits_upstream () {
+ (
+ cd upstream &&
+ echo "more content" >>file &&
+ git add file &&
+ git commit -a -m "create a commit"
+ )
+}
+
+# Compare the ref ($1) in upstream with a ref value from workbench ($2)
+# i.e. test_refs second HEAD@{2}
+test_refs () {
+ test $# = 2 &&
+ git -C upstream rev-parse --verify "$1" >expect &&
+ git -C workbench rev-parse --verify "$2" >actual &&
+ test_cmp expect actual
+}
+
+test_expect_success 'transport.capabilitiesfirst is not overridden
when set already' '
+ mk_repo_pair &&
+ (
+ cd workbench &&
+ git config transport.capabilitiesfirst 0
+ git config --get transport.capabilitiesfirst 0 >expected
+ )
+ generate_commits_upstream &&
+ (
+ cd workbench &&
+ git fetch --all
+ git config --get transport.capabilitiesfirst >actual
+ test_cmp expected actual
+ )
+'
+
+test_expect_success 'enable transport by fetching from new server' '
+ mk_repo_pair &&
+ (
+ cd workbench &&
+ git fetch origin
+ ) &&
+ generate_commits_upstream &&
+ (
+ cd workbench &&
+ git config --get transport.capabilitiesfirst >actual &&
+ echo "1" >expected &&
+ test_cmp expected actual
+ )
+'
+
+test_expect_success 'test new transport still works' '
+ # continue from last test
+ generate_commits_upstream &&
+ (
+ cd workbench &&
+ git fetch origin
+ )
+ test_refs HEAD refs/remotes/origin/HEAD
+ test_refs master refs/remotes/origin/master
+'
+
+test_done
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-27  1:26                 ` Stefan Beller
@ 2015-02-27  2:15                   ` Stefan Beller
  0 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-27  2:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

 On Thu, Feb 26, 2015 at 12:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I agree with the value assessment of these patches 98%, but these
> bits can be taken as the "we have v2 server availble for you on the
> side, by the way" hint you mentioned in the older thread, I think.

The patches are not well polished (In fact they don't even compile :/),
but I think they may demonstrate the ideas and though process. And
as it turns out we'd not be following that spirit of ideas but rather want
to have a dedicated v2.

That said I did not want to spend lots of time to polish the patch for
inclusion but rather to demonstrate ideas, which can be done with
substantial less quality IMHO. Correct me if I am wrong here!

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
       [not found]                 ` <CACsJy8DOS_999ZgW7TqsH-dkrUFpjZf0TFQeFUt9s0bNhHY0Bw@mail.gmail.com>
@ 2015-02-27 22:20                   ` Stefan Beller
  0 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-27 22:20 UTC (permalink / raw)
  To: Duy Nguyen, git@vger.kernel.org

+git@vger.kernel.org

On Thu, Feb 26, 2015 at 5:42 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> https://github.com/pclouds/git/commits/uploadpack2

I rebased your branch, changed the order of commits slightly and
started to add some.
they are found at https://github.com/stefanbeller/git/commits/uploadpack2

I think the very first patch series which I try to polish now will
just try to move the
capabilities negotiation into the beginning of the exchange.

Any 'real' changes such as adding capabilities to the protocol to not
have all the refs
advertised will come in a later series.

Thanks for your help!
Stefan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-26 20:13               ` Junio C Hamano
  2015-02-27  1:26                 ` Stefan Beller
@ 2015-02-27 23:05                 ` Junio C Hamano
  2015-02-27 23:44                   ` Stefan Beller
  2015-02-28  0:07                   ` [RFC/PATCH 0/3] protocol v2 Duy Nguyen
  1 sibling, 2 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-02-27 23:05 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> I do not think v1 can be fixed by "send one ref with capability,
> newer client may respond immediately so we can stop enumerating
> remaining refs and older one will get stuck so we can have a timeout
> to see if the connection is from the newer one, and send the rest
> for the older client", because anything that involves such a timeout
> would not reliably work over WAN.

Just for fun, I was trying to see if there is a hole in the current
protocol that allows a new client to talk a valid v1 protocol
exchange with existing, deployed servers without breaking, while
letting it to know a new server that it is a new client and it does
not want to get blasted by megabytes of ref advertisement.

The idea is to find a request that can be sent as the first
utterance by the client to an old server that is interpreted as a
no-op and can be recognised by a new server as such a "no-op probe".
If there is such a request, then the exchange can go like this with
(new client, old server) pair:

    - new client connects and sends that no-op.

    - old server starts blasting the ref advertisement

    - new client monitors and notices that the other side
      started speaking, and the ref advertisement lacks the
      capability bit for new protocol.

    - new client accepts the ref advertisement and does the v1
      protocol thing as a follow-up to what it already sent.

As long as the first one turns out to be no-op for old server, we
would be OK.  On the other hand, (new client, new server) pair
would go like this:

    - new client connects and sends that no-op.

    - new server notices that there is already a data from the
      client, and recognises the "no-op probe".

    - new server gives the first v2 protocol message with
      capability.

    - new client notices thqat the other side started speaking, and
      it is the first v2 protocol message.

    - both sides happily speak v2.

and (old client, new server) pair would go like this:

    - old client connects and waits.

    - new server notices that there is *no* data sent from the
      client and decides that the other side is a v1 client.  It
      starts blasting the ref advertisement.

    - both sides happily speak v1 from here on.

A misdetected case between (new client, new server) pair might go
like this:

    - new client connects and sends that no-op.

    - new server accepts the connection, but that no-op probe has
      not arrived yet.".  It misdetects the other side as a v1
      client and it starts blasting the ref advertisement.

    - new client notices that the ref advertisement has the
      capability bit and the server is capable of v2 protocol.  it
      waits until the server sends "sorry, I misdetected" message.

    - new server eventually notices the "no-op probe" while blasting
      the ref advertisement and it can stop in the middle.
      hopefully this can happen after only sending a few kilobytes
      among megabytes of ref advertisement data ;-).  The server
      sends "sorry, I misdetected" message to synchronise.

    - both sides happily speak v2 from here on.

So the topic of this exercise ("just for fun") is to see if there is
such a no-op request the client side can send as the first thing for
probing.

On the fetch side, the first response upload-pack expects are one
of:

  - "want " followed by an object name.
  - "shallow " followed by an object name.
  - "deepen " followed by a positive integer.

And there _is_ a hole ;-).  The parsing of "shallow " object name is
done in such a way that an object name that passes get_sha1_hex()
that results in a NULL return from parse_object() is _ignored_.  So
a new client can use "shallow 0{40}" as a no-op probe.

It appears that on the push side, there is a similar hole that can
be used. receive-pack expects either "shallow ", "push-cert" or the
refname updates (i.e. two "[0-9a-f]{40}" followed by a refname); the
parsing of "shallow " is not as loose as the fetch side in that
using a "shallow 0{40}" as a no-op probe will end up causing
prepare_shallow_info() sift the "0{40}" object name into "theirs",
but I think it will be ignored at the end as "unreachable cruft"
without causing harm.

I am _not_ proposing that we should go this route, at least not yet.
I am merely pointing out that an in-place sidegrade from v1 to a
protocol that avoids the megabyte-advertisement-at-the-beginning
seems to be possible, as a food for thought.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-27 23:05                 ` Junio C Hamano
@ 2015-02-27 23:44                   ` Stefan Beller
  2015-02-28  0:33                     ` Junio C Hamano
  2015-02-28  0:07                   ` [RFC/PATCH 0/3] protocol v2 Duy Nguyen
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-27 23:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> I do not think v1 can be fixed by "send one ref with capability,
>> newer client may respond immediately so we can stop enumerating
>> remaining refs and older one will get stuck so we can have a timeout
>> to see if the connection is from the newer one, and send the rest
>> for the older client", because anything that involves such a timeout
>> would not reliably work over WAN.
>
> Just for fun, I was trying to see if there is a hole in the current
> protocol that allows a new client to talk a valid v1 protocol
> exchange with existing, deployed servers without breaking, while
> letting it to know a new server that it is a new client and it does
> not want to get blasted by megabytes of ref advertisement.
>
> The idea is to find a request that can be sent as the first
> utterance by the client to an old server that is interpreted as a
> no-op and can be recognised by a new server as such a "no-op probe".
> If there is such a request, then the exchange can go like this with
> (new client, old server) pair:
>
>     - new client connects and sends that no-op.
>
>     - old server starts blasting the ref advertisement
>
>     - new client monitors and notices that the other side
>       started speaking, and the ref advertisement lacks the
>       capability bit for new protocol.
>
>     - new client accepts the ref advertisement and does the v1
>       protocol thing as a follow-up to what it already sent.
>
> As long as the first one turns out to be no-op for old server, we
> would be OK.  On the other hand, (new client, new server) pair
> would go like this:
>
>     - new client connects and sends that no-op.
>
>     - new server notices that there is already a data from the
>       client, and recognises the "no-op probe".
>
>     - new server gives the first v2 protocol message with
>       capability.
>
>     - new client notices thqat the other side started speaking, and
>       it is the first v2 protocol message.
>
>     - both sides happily speak v2.
>
> and (old client, new server) pair would go like this:
>
>     - old client connects and waits.
>
>     - new server notices that there is *no* data sent from the
>       client and decides that the other side is a v1 client.  It
>       starts blasting the ref advertisement.
>
>     - both sides happily speak v1 from here on.
>
> A misdetected case between (new client, new server) pair might go
> like this:
>
>     - new client connects and sends that no-op.
>
>     - new server accepts the connection, but that no-op probe has
>       not arrived yet.".  It misdetects the other side as a v1
>       client and it starts blasting the ref advertisement.
>
>     - new client notices that the ref advertisement has the
>       capability bit and the server is capable of v2 protocol.  it
>       waits until the server sends "sorry, I misdetected" message.
>
>     - new server eventually notices the "no-op probe" while blasting
>       the ref advertisement and it can stop in the middle.
>       hopefully this can happen after only sending a few kilobytes
>       among megabytes of ref advertisement data ;-).  The server
>       sends "sorry, I misdetected" message to synchronise.
>
>     - both sides happily speak v2 from here on.
>
> So the topic of this exercise ("just for fun") is to see if there is
> such a no-op request the client side can send as the first thing for
> probing.
>
> On the fetch side, the first response upload-pack expects are one
> of:
>
>   - "want " followed by an object name.
>   - "shallow " followed by an object name.
>   - "deepen " followed by a positive integer.
>
> And there _is_ a hole ;-).  The parsing of "shallow " object name is
> done in such a way that an object name that passes get_sha1_hex()
> that results in a NULL return from parse_object() is _ignored_.  So
> a new client can use "shallow 0{40}" as a no-op probe.
>
> It appears that on the push side, there is a similar hole that can
> be used. receive-pack expects either "shallow ", "push-cert" or the
> refname updates (i.e. two "[0-9a-f]{40}" followed by a refname); the
> parsing of "shallow " is not as loose as the fetch side in that
> using a "shallow 0{40}" as a no-op probe will end up causing
> prepare_shallow_info() sift the "0{40}" object name into "theirs",
> but I think it will be ignored at the end as "unreachable cruft"
> without causing harm.
>
> I am _not_ proposing that we should go this route, at least not yet.
> I am merely pointing out that an in-place sidegrade from v1 to a
> protocol that avoids the megabyte-advertisement-at-the-beginning
> seems to be possible, as a food for thought.
>


This is a fun thing indeed, though I'd personally feel uneasy with
such a probe as
a serious proposal. (Remember somebody 10 years from now wants to enjoy
reading the source code). So let's keep the idea around if we don't find another
solution.

As far as I can tell we have
* native git protocol (git daemon)
* ssh
* http(s)
* ftp (deprecated!)
* rsync(deprecated)

For both native git as well as ssh, Duy presented a solution at [1, 2]
a year ago,
which essentially presents the desired client capabilites 'out of
band' to the server
via an argument to the server.  So we'd only need to examine the
http(s) path how to
pass in arguments there.

Since 5 years an additional argument in the git protocol would cause
no harm, so it's
pretty safe to extend it with caution. We'd advertise the
client-may-ask-for-capabilities-out-of-band
per server and the client would eventually learn about it and only use
this for servers
which advertised they can do so. And nobody sane out there will
downgrade a version
of today to a git of 5 years ago.

I did rebase Duys' patches and I am trying to send out a patch series
today going that route
asking for comments.

Thanks a lot for the digging!
Stefan

[1] https://github.com/pclouds/git/commit/e26fa77c4d9ace06b9f2c80091af9eb7b63a1c95
[2] https://github.com/pclouds/git/commit/20d048e5fc650b20fdc7dd8bbe35cb8510ac9c50

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-27 23:05                 ` Junio C Hamano
  2015-02-27 23:44                   ` Stefan Beller
@ 2015-02-28  0:07                   ` Duy Nguyen
  2015-02-28  0:26                     ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-02-28  0:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Git Mailing List

On Sat, Feb 28, 2015 at 6:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Just for fun, I was trying to see if there is a hole in the current
> protocol that allows a new client to talk a valid v1 protocol
> exchange with existing, deployed servers without breaking, while
> letting it to know a new server that it is a new client and it does
> not want to get blasted by megabytes of ref advertisement.
> ...
> The idea is to find a request that can be sent as the first
> utterance by the client to an old server that is interpreted as a
> no-op and can be recognised by a new server as such a "no-op probe".
> ...
> And there _is_ a hole ;-).  The parsing of "shallow " object name is
> done in such a way that an object name that passes get_sha1_hex()
> that results in a NULL return from parse_object() is _ignored_.  So
> a new client can use "shallow 0{40}" as a no-op probe.
> ...
> I am _not_ proposing that we should go this route, at least not yet.
> I am merely pointing out that an in-place sidegrade from v1 to a
> protocol that avoids the megabyte-advertisement-at-the-beginning
> seems to be possible, as a food for thought.

There may be another hole, if we send "want <empty-tree>", it looks
like it will go through without causing errors. It's not exactly no-op
because an empty tree object will be bundled in result pack. But that
makes no difference in pratice. I didn't verify this though.

In the spirit of fun, I looked at how jgit handles this shallow line
(because this is more like an implementation hole than protocol hole).
I don't think jgit would ignore 0{40} the way C Git does. This SHA-1
will end up in shallowCommits set in upload-pack, then will be parsed
as a commit. But even if the parsing is through, a non-empty
shallowCommits set would disable pack bitmap. Fun is usually short..

PS. heh my "want empty-tree" hole is probably impl-specific too. Not
sure if jgit also keeps empty tree available even if it does not
exist.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-28  0:07                   ` [RFC/PATCH 0/3] protocol v2 Duy Nguyen
@ 2015-02-28  0:26                     ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-02-28  0:26 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

On Fri, Feb 27, 2015 at 4:07 PM, Duy Nguyen <pclouds@gmail.com> wrote:
>
> There may be another hole, if we send "want <empty-tree>", it looks
> like it will go through without causing errors. It's not exactly no-op
> because an empty tree object will be bundled in result pack. But that
> makes no difference in pratice. I didn't verify this though.

In addition to "that's not a no-op" problem, unless the old server has a
ref that has an emtpy tree at its tip, such a fetch request will be rejected,
unless the server is configured to serve any object, no?

If your new server does have a ref that points at an empty tree, a client
may request you to send that, but this is not a problem, because the
new server can tell if the client is sending it as a no-op probe or a serious
request by looking at its capability request. A serious old client will not
tell you that he is new, a probing new client does, and a serious new
client does. So your new server can tell and will not be confused.

> as a commit. But even if the parsing is through, a non-empty
> shallowCommits set would disable pack bitmap.

Performance penalty is fine. Over time we would upgrade and the
point of the exercise is not to cause the old-new or new-old pair to
die but keep talking the old protocol and getting correct results.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-27 23:44                   ` Stefan Beller
@ 2015-02-28  0:33                     ` Junio C Hamano
  2015-02-28  0:46                       ` Stefan Beller
  0 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2015-02-28  0:33 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, Git Mailing List

On Fri, Feb 27, 2015 at 3:44 PM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> I am _not_ proposing that we should go this route, at least not yet.
>> I am merely pointing out that an in-place sidegrade from v1 to a
>> protocol that avoids the megabyte-advertisement-at-the-beginning
>> seems to be possible, as a food for thought.
>
> This is a fun thing indeed, though I'd personally feel uneasy with
> such a probe as
> a serious proposal. (Remember somebody 10 years from now wants to enjoy
> reading the source code).

That cannot be a serious objection, once you realize that NUL + capability
was exactly the same kind of "yes, we have a hole to allow up customize
the protocol". The code to do so may not be pretty, but the code to implement
ended up being reasonably clean with parse_feature_request() and friends.
After all we live in a real world ;-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-28  0:33                     ` Junio C Hamano
@ 2015-02-28  0:46                       ` Stefan Beller
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
  2015-03-02  3:47                         ` [RFC/PATCH 0/3] protocol v2 Junio C Hamano
  0 siblings, 2 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  0:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Fri, Feb 27, 2015 at 4:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> On Fri, Feb 27, 2015 at 3:44 PM, Stefan Beller <sbeller@google.com> wrote:
>> On Fri, Feb 27, 2015 at 3:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> I am _not_ proposing that we should go this route, at least not yet.
>>> I am merely pointing out that an in-place sidegrade from v1 to a
>>> protocol that avoids the megabyte-advertisement-at-the-beginning
>>> seems to be possible, as a food for thought.
>>
>> This is a fun thing indeed, though I'd personally feel uneasy with
>> such a probe as
>> a serious proposal. (Remember somebody 10 years from now wants to enjoy
>> reading the source code).
>
> That cannot be a serious objection, once you realize that NUL + capability
> was exactly the same kind of "yes, we have a hole to allow up customize
> the protocol". The code to do so may not be pretty, but the code to implement
> ended up being reasonably clean with parse_feature_request() and friends.
> After all we live in a real world ;-)

    - new server accepts the connection, but that no-op probe has
      not arrived yet.".  It misdetects the other side as a v1
      client and it starts blasting the ref advertisement.

A race condition may be a serious objection then? Once people believe the
refs can scale fairly well they will use it, which means blasting the ref
advertisement will become very worse over time.
I'll try to present a 'client asks for options first out of band' instead of the
way you describe.

Also we should not rely on having holes here and there. (We might run out of
holes over time), so I'd rather have the capabilities presented at first
which rather opens new holes instead of closing old ones.

(assuming we'll never run into megabytes of capabilities
over time to have the same trouble again ;)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [RFC/PATCH 0/5] protocol v2 for upload-pack
  2015-02-28  0:46                       ` Stefan Beller
@ 2015-02-28  1:01                         ` Stefan Beller
  2015-02-28  1:01                           ` [RFC/PATCH 1/5] upload-pack: only accept capabilities on the first "want" line Stefan Beller
                                             ` (5 more replies)
  2015-03-02  3:47                         ` [RFC/PATCH 0/3] protocol v2 Junio C Hamano
  1 sibling, 6 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  1:01 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster, Stefan Beller

Heavily inspired by the ideas of Duy, who wrote the first
patches nearly a year ago.

Nguyễn Thái Ngọc Duy (2):
  upload-pack: only accept capabilities on the first "want" line
  upload-pack: support out of band client capability requests

Stefan Beller (3):
  connect.c: connect to a remote service with some flags
  daemon.c: accept extra service arguments
  WIP/Document the http protocol change

 Documentation/git-upload-pack.txt                 | 10 +++++-
 Documentation/technical/http-protocol.txt         |  4 +--
 Documentation/technical/protocol-capabilities.txt |  4 +++
 builtin/fetch-pack.c                              |  2 +-
 builtin/send-pack.c                               |  2 +-
 connect.c                                         | 21 ++++++------
 connect.h                                         |  2 +-
 daemon.c                                          | 37 +++++++++++++--------
 transport.c                                       |  3 +-
 upload-pack.c                                     | 39 +++++++++++++++++------
 10 files changed, 86 insertions(+), 38 deletions(-)

-- 
2.3.0.81.gc37f363

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [RFC/PATCH 1/5] upload-pack: only accept capabilities on the first "want" line
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
@ 2015-02-28  1:01                           ` Stefan Beller
  2015-02-28  1:01                           ` [RFC/PATCH 2/5] upload-pack: support out of band client capability requests Stefan Beller
                                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  1:01 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster, Stefan Beller

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

pack-protocol.txt says so and fetch-pack also follows it even though
upload-pack is a bit lax. Fix it.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 upload-pack.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/upload-pack.c b/upload-pack.c
index e0ce2bf..d9230ba 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -558,6 +558,7 @@ static void receive_needs(void)
 	struct object_array shallows = OBJECT_ARRAY_INIT;
 	int depth = 0;
 	int has_non_tip = 0;
+	int first_want = 1;
 
 	shallow_nr = 0;
 	for (;;) {
@@ -596,7 +597,11 @@ static void receive_needs(void)
 			die("git upload-pack: protocol error, "
 			    "expected to get sha, not '%s'", line);
 
-		parse_features(line + 45);
+		if (first_want) {
+			parse_features(line + 45);
+			first_want = 0;
+		} else if (line[45])
+			die("garbage at the end of 'want' line %s", line + 45);
 
 		o = parse_object(sha1_buf);
 		if (!o)
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC/PATCH 2/5] upload-pack: support out of band client capability requests
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
  2015-02-28  1:01                           ` [RFC/PATCH 1/5] upload-pack: only accept capabilities on the first "want" line Stefan Beller
@ 2015-02-28  1:01                           ` Stefan Beller
  2015-02-28  7:47                             ` Kyle J. McKay
  2015-02-28 11:36                             ` Duy Nguyen
  2015-02-28  1:01                           ` [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags Stefan Beller
                                             ` (3 subsequent siblings)
  5 siblings, 2 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  1:01 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster, Stefan Beller

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The only difference from the original protocol client capabilities are
negotiated before initial refs advertisment.

Client capabilities are sent out of band (upload-pack receives it as
the second command line argument). The server sends one pkt-line back
advertising its capabilities.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Notes:
    v1:
    I am still undecided if the client should then accept/resend
    the capabilities to confirm them, which would make the client the
    ultimate decider which capabilities are used.
    
    My gut feeling is to rather let the server make the final decision
    for the capabilities, as it will use some requested capabilities
    already to not send out all the refs.

 Documentation/git-upload-pack.txt | 10 +++++++++-
 upload-pack.c                     | 42 +++++++++++++++++++++++++++------------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-upload-pack.txt b/Documentation/git-upload-pack.txt
index 0abc806..ad3a89d 100644
--- a/Documentation/git-upload-pack.txt
+++ b/Documentation/git-upload-pack.txt
@@ -9,7 +9,7 @@ git-upload-pack - Send objects packed back to git-fetch-pack
 SYNOPSIS
 --------
 [verse]
-'git-upload-pack' [--strict] [--timeout=<n>] <directory>
+'git-upload-pack' [--strict] [--timeout=<n>] <directory> [<capabilities>]
 
 DESCRIPTION
 -----------
@@ -34,6 +34,14 @@ OPTIONS
 <directory>::
 	The repository to sync from.
 
+capabilities::
+	Historically the capabilities were exchanged inside the protocol of
+	'git-upload-pack' talking to 'git-fetch-pack'. It turned out this was
+	too late as 'git-upload-pack' already did work, which may have been
+	avoided. This allows to pass in the capabilities the client wants to
+	use as one argument. The capabilites are separated by space.
+	See technical/protocol-capabilities.txt (TODO: how to make it a link?)
+
 SEE ALSO
 --------
 linkgit:gitnamespaces[7]
diff --git a/upload-pack.c b/upload-pack.c
index d9230ba..2e62c3f 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -31,6 +31,11 @@ static const char upload_pack_usage[] = "git upload-pack [--strict] [--timeout=<
 
 static unsigned long oldest_have;
 
+/**
+ * client capabilities presented as program arguments, dissallow further
+ * capabilities sent by client
+ */
+static int capabilities_first;
 static int multi_ack;
 static int no_done;
 static int use_thin_pack, use_ofs_delta, use_include_tag;
@@ -597,11 +602,14 @@ static void receive_needs(void)
 			die("git upload-pack: protocol error, "
 			    "expected to get sha, not '%s'", line);
 
-		if (first_want) {
-			parse_features(line + 45);
-			first_want = 0;
-		} else if (line[45])
-			die("garbage at the end of 'want' line %s", line + 45);
+		if (!capabilities_first) {
+			if (first_want) {
+				parse_features(line + 45);
+				first_want = 0;
+			} else if (line[45]) {
+				die("garbage at the end of 'want' line %s", line + 45);
+			}
+		}
 
 		o = parse_object(sha1_buf);
 		if (!o)
@@ -840,17 +848,25 @@ int main(int argc, char **argv)
 		}
 	}
 
-	if (i != argc-1)
-		usage(upload_pack_usage);
+	switch (argc - i) {
+		case 2:
+			capabilities_first = 1;
+			parse_features(argv[i + 1]);
+			/* fall through*/
+		case 1:
+			setup_path();
 
-	setup_path();
+			dir = argv[i];
 
-	dir = argv[i];
+			if (!enter_repo(dir, strict))
+				die("'%s' does not appear to be a git repository", dir);
 
-	if (!enter_repo(dir, strict))
-		die("'%s' does not appear to be a git repository", dir);
+			git_config(upload_pack_config, NULL);
+			upload_pack();
+			break;
+		default:
+			usage(upload_pack_usage);
+	}
 
-	git_config(upload_pack_config, NULL);
-	upload_pack();
 	return 0;
 }
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
  2015-02-28  1:01                           ` [RFC/PATCH 1/5] upload-pack: only accept capabilities on the first "want" line Stefan Beller
  2015-02-28  1:01                           ` [RFC/PATCH 2/5] upload-pack: support out of band client capability requests Stefan Beller
@ 2015-02-28  1:01                           ` Stefan Beller
  2015-02-28 11:11                             ` Torsten Bögershausen
  2015-03-01  3:22                             ` Junio C Hamano
  2015-02-28  1:01                           ` [RFC/PATCH 4/5] daemon.c: accept extra service arguments Stefan Beller
                                             ` (2 subsequent siblings)
  5 siblings, 2 replies; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  1:01 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster, Stefan Beller

If this is over git protocol, the flags is appended as the next
parameter after host=. If it's ssh, a new argument is appended to the
command line.

None of the callers use this now though.

[sb: originally by pclouds, rebased as jk implemented 1823bea10,
(git_connect: use argv_array), so any error is mine]

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch-pack.c |  2 +-
 builtin/send-pack.c  |  2 +-
 connect.c            | 18 ++++++++++++------
 connect.h            |  2 +-
 transport.c          |  3 ++-
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 4a6b340..7e5b5fd 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -171,7 +171,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 		if (args.diag_url)
 			flags |= CONNECT_DIAG_URL;
 		conn = git_connect(fd, dest, args.uploadpack,
-				   flags);
+				   NULL, flags);
 		if (!conn)
 			return args.diag_url ? 0 : 1;
 	}
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index b961e5a..c2a066a 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -261,7 +261,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 		fd[0] = 0;
 		fd[1] = 1;
 	} else {
-		conn = git_connect(fd, dest, receivepack,
+		conn = git_connect(fd, dest, receivepack, NULL,
 			args.verbose ? CONNECT_VERBOSE : 0);
 	}
 
diff --git a/connect.c b/connect.c
index 062e133..7b6b241 100644
--- a/connect.c
+++ b/connect.c
@@ -650,7 +650,9 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
  * the connection failed).
  */
 struct child_process *git_connect(int fd[2], const char *url,
-				  const char *prog, int flags)
+				  const char *prog,
+				  const char *service_flags,
+				  int flags)
 {
 	char *hostandport, *path;
 	struct child_process *conn = &no_fork;
@@ -685,10 +687,13 @@ struct child_process *git_connect(int fd[2], const char *url,
 		 * Note: Do not add any other headers here!  Doing so
 		 * will cause older git-daemon servers to crash.
 		 */
-		packet_write(fd[1],
-			     "%s %s%chost=%s%c",
-			     prog, path, 0,
-			     target_host, 0);
+		if (!service_flags)
+			packet_write(fd[1], "%s %s%chost=%s%c",
+				     prog, path, 0, target_host, 0);
+		else
+			packet_write(fd[1], "%s %s%chost=%s%c%s%c",
+				     prog, path, 0, target_host, 0,
+				     service_flags, 0);
 		free(target_host);
 	} else {
 		conn = xmalloc(sizeof(*conn));
@@ -733,7 +738,8 @@ struct child_process *git_connect(int fd[2], const char *url,
 			conn->use_shell = 1;
 		}
 		argv_array_push(&conn->args, cmd.buf);
-
+		if (service_flags)
+			argv_array_push(&conn->args, service_flags);
 		if (start_command(conn))
 			die("unable to fork");
 
diff --git a/connect.h b/connect.h
index c41a685..c4fa8a1 100644
--- a/connect.h
+++ b/connect.h
@@ -3,7 +3,7 @@
 
 #define CONNECT_VERBOSE       (1u << 0)
 #define CONNECT_DIAG_URL      (1u << 1)
-extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
+extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, const char *service_flags, int flags);
 extern int finish_connect(struct child_process *conn);
 extern int git_connection_is_socket(struct child_process *conn);
 extern int server_supports(const char *feature);
diff --git a/transport.c b/transport.c
index 0694a7c..626fd92 100644
--- a/transport.c
+++ b/transport.c
@@ -495,6 +495,7 @@ static int connect_setup(struct transport *transport, int for_push, int verbose)
 	data->conn = git_connect(data->fd, transport->url,
 				 for_push ? data->options.receivepack :
 				 data->options.uploadpack,
+				 NULL,
 				 verbose ? CONNECT_VERBOSE : 0);
 
 	return 0;
@@ -850,7 +851,7 @@ static int connect_git(struct transport *transport, const char *name,
 {
 	struct git_transport_data *data = transport->data;
 	data->conn = git_connect(data->fd, transport->url,
-				 executable, 0);
+				 executable, NULL, 0);
 	fd[0] = data->fd[0];
 	fd[1] = data->fd[1];
 	return 0;
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC/PATCH 4/5] daemon.c: accept extra service arguments
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
                                             ` (2 preceding siblings ...)
  2015-02-28  1:01                           ` [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags Stefan Beller
@ 2015-02-28  1:01                           ` Stefan Beller
  2015-03-01  3:39                             ` Junio C Hamano
  2015-02-28  1:01                           ` [RFC/PATCH 5/5] WIP/Document the http protocol change Stefan Beller
  2015-03-01  9:11                           ` [RFC/PATCH 0/5] protocol v2 for upload-pack Johannes Sixt
  5 siblings, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  1:01 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster, Stefan Beller

Before 73bb33a (daemon: Strictly parse the "extra arg" part of the
command - 2009-06-04) a client sending extra arguments could DoS
git-daemon. 73bb33a fixed it by forbidding extra arguments.

Allow arguments other than "host=" again as a preparation step for
upload-pack2. "host=" if present must be the first argument
though. The remaining arguments are concatenated by whitespace.

So far none of supported services support extra arguments. Attempting
to do will abort the service, just like how it is before. We might
want to make them silently ignore extra arguments though.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 connect.c |  3 ---
 daemon.c  | 37 ++++++++++++++++++++++++-------------
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/connect.c b/connect.c
index 7b6b241..97dd732 100644
--- a/connect.c
+++ b/connect.c
@@ -683,9 +683,6 @@ struct child_process *git_connect(int fd[2], const char *url,
 		/*
 		 * Separate original protocol components prog and path
 		 * from extended host header with a NUL byte.
-		 *
-		 * Note: Do not add any other headers here!  Doing so
-		 * will cause older git-daemon servers to crash.
 		 */
 		if (!service_flags)
 			packet_write(fd[1], "%s %s%chost=%s%c",
diff --git a/daemon.c b/daemon.c
index 54a03bd..c45d0d6 100644
--- a/daemon.c
+++ b/daemon.c
@@ -221,7 +221,7 @@ static const char *path_ok(const char *directory)
 	return NULL;		/* Fallthrough. Deny by default */
 }
 
-typedef int (*daemon_service_fn)(void);
+typedef int (*daemon_service_fn)(const char *);
 struct daemon_service {
 	const char *name;
 	const char *config_name;
@@ -302,7 +302,8 @@ error_return:
 	return -1;
 }
 
-static int run_service(const char *dir, struct daemon_service *service)
+static int run_service(const char *dir, struct daemon_service *service,
+		       const char *args)
 {
 	const char *path;
 	int enabled = service->enabled;
@@ -361,7 +362,7 @@ static int run_service(const char *dir, struct daemon_service *service)
 	 */
 	signal(SIGTERM, SIG_IGN);
 
-	return service->fn();
+	return service->fn(args);
 }
 
 static void copy_to_log(int fd)
@@ -403,27 +404,31 @@ static int run_service_command(const char **argv)
 	return finish_command(&cld);
 }
 
-static int upload_pack(void)
+static int upload_pack(const char *args)
 {
 	/* Timeout as string */
 	char timeout_buf[64];
-	const char *argv[] = { "upload-pack", "--strict", NULL, ".", NULL };
+	const char *argv[] = { "upload-pack", "--strict", NULL, ".", NULL, NULL };
 
 	argv[2] = timeout_buf;
+	argv[4] = args;
 
 	snprintf(timeout_buf, sizeof timeout_buf, "--timeout=%u", timeout);
 	return run_service_command(argv);
 }
 
-static int upload_archive(void)
+static int upload_archive(const char *args)
 {
 	static const char *argv[] = { "upload-archive", ".", NULL };
+	if (args)
+		die("invalid request");
 	return run_service_command(argv);
 }
 
-static int receive_pack(void)
+static int receive_pack(const char *args)
 {
-	static const char *argv[] = { "receive-pack", ".", NULL };
+	static const char *argv[] = { "receive-pack", ".", NULL, NULL };
+	argv[2] = args;
 	return run_service_command(argv);
 }
 
@@ -487,7 +492,7 @@ static void parse_host_and_port(char *hostport, char **host,
 /*
  * Read the host as supplied by the client connection.
  */
-static void parse_host_arg(char *extra_args, int buflen)
+static void parse_host_arg(char *extra_args, char **remaining_args, int buflen)
 {
 	char *val;
 	int vallen;
@@ -514,8 +519,13 @@ static void parse_host_arg(char *extra_args, int buflen)
 			/* On to the next one */
 			extra_args = val + vallen;
 		}
-		if (extra_args < end && *extra_args)
-			die("Invalid request");
+	}
+
+	if (remaining_args) {
+		for (val = extra_args; val < end; val++)
+			if (!*val)
+				*val = ' ';
+		*remaining_args = extra_args;
 	}
 
 	/*
@@ -577,6 +587,7 @@ static int execute(void)
 {
 	char *line = packet_buffer;
 	int pktlen, len, i;
+	char *args = NULL;
 	char *addr = getenv("REMOTE_ADDR"), *port = getenv("REMOTE_PORT");
 
 	if (addr)
@@ -603,7 +614,7 @@ static int execute(void)
 	hostname = canon_hostname = ip_address = tcp_port = NULL;
 
 	if (len != pktlen)
-		parse_host_arg(line + len + 1, pktlen - len - 1);
+		parse_host_arg(line + len + 1, &args, pktlen - len - 1);
 
 	for (i = 0; i < ARRAY_SIZE(daemon_service); i++) {
 		struct daemon_service *s = &(daemon_service[i]);
@@ -616,7 +627,7 @@ static int execute(void)
 			 * Note: The directory here is probably context sensitive,
 			 * and might depend on the actual service being performed.
 			 */
-			return run_service(arg, s);
+			return run_service(arg, s, args);
 		}
 	}
 
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC/PATCH 5/5] WIP/Document the http protocol change
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
                                             ` (3 preceding siblings ...)
  2015-02-28  1:01                           ` [RFC/PATCH 4/5] daemon.c: accept extra service arguments Stefan Beller
@ 2015-02-28  1:01                           ` Stefan Beller
  2015-02-28 12:26                             ` Duy Nguyen
  2015-03-01  9:11                           ` [RFC/PATCH 0/5] protocol v2 for upload-pack Johannes Sixt
  5 siblings, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-02-28  1:01 UTC (permalink / raw)
  To: git; +Cc: pclouds, gitster, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/technical/http-protocol.txt         | 4 ++--
 Documentation/technical/protocol-capabilities.txt | 4 ++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 229f845..638819d 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -191,10 +191,10 @@ HTTP clients that support the "smart" protocol (or both the
 "smart" and "dumb" protocols) MUST discover references by making
 a parameterized request for the info/refs file of the repository.
 
-The request MUST contain exactly one query parameter,
+The request MAY contain parameters. Supported parameters includes
 `service=$servicename`, where `$servicename` MUST be the service
 name the client wishes to contact to complete the operation.
-The request MUST NOT contain additional query parameters.
+Further parameters are as described in protocol-capabilities.txt
 
    C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
 
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 4f8a7bf..40ddb37 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -3,6 +3,10 @@ Git Protocol Capabilities
 
 Servers SHOULD support all capabilities defined in this document.
 
+The client MAY ask for capabilities first out of band to the server.
+If so the server MUST NOT advertise any capabilities the client did
+not ask for.
+
 On the very first line of the initial server response of either
 receive-pack and upload-pack the first reference is followed by
 a NUL byte and then a list of space delimited server capabilities.
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 2/5] upload-pack: support out of band client capability requests
  2015-02-28  1:01                           ` [RFC/PATCH 2/5] upload-pack: support out of band client capability requests Stefan Beller
@ 2015-02-28  7:47                             ` Kyle J. McKay
  2015-02-28 11:22                               ` Duy Nguyen
  2015-02-28 11:36                             ` Duy Nguyen
  1 sibling, 1 reply; 80+ messages in thread
From: Kyle J. McKay @ 2015-02-28  7:47 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, gitster

On Feb 27, 2015, at 17:01, Stefan Beller wrote:
> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> The only difference from the original protocol client capabilities are
> negotiated before initial refs advertisment.
>
> Client capabilities are sent out of band (upload-pack receives it as
> the second command line argument). The server sends one pkt-line back
> advertising its capabilities.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> Notes:
>    v1:
>    I am still undecided if the client should then accept/resend
>    the capabilities to confirm them, which would make the client the
>    ultimate decider which capabilities are used.
>
>    My gut feeling is to rather let the server make the final decision
>    for the capabilities, as it will use some requested capabilities
>    already to not send out all the refs.
>
> Documentation/git-upload-pack.txt | 10 +++++++++-
> upload-pack.c                     | 42 ++++++++++++++++++++++++++ 
> +------------
> 2 files changed, 38 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/git-upload-pack.txt b/Documentation/git- 
> upload-pack.txt
> index 0abc806..ad3a89d 100644
> --- a/Documentation/git-upload-pack.txt
> +++ b/Documentation/git-upload-pack.txt
> @@ -9,7 +9,7 @@ git-upload-pack - Send objects packed back to git- 
> fetch-pack
> SYNOPSIS
> --------
> [verse]
> -'git-upload-pack' [--strict] [--timeout=<n>] <directory>
> +'git-upload-pack' [--strict] [--timeout=<n>] <directory>  
> [<capabilities>]

Isn't the problem with this that passing the extra argument to ssh  
servers will cause them to fail?

Having just looked at the upload-pack.c source it looks to me like  
trying to send "git-upload-pack 'dir' 'capabilities'" to an ssh git  
server running a current version of the code will just end up  
failing.  I realize the extra argument is optional, so does that mean  
there's no out-of-band support for ssh connections since the extra  
argument would have to be omitted to remain compatible?

On Feb 26, 2015, at 12:13, Junio C Hamano wrote:
> The capability-based sidegrade does not solve the problem
> when the problem to be solved is that the server side needs to spend
> a lot of cycles and the network needs to carry megabytes of data
> before capability exchange happens.

On Feb 27, 2015, at 16:46, Stefan Beller wrote:
> I'll try to present a 'client asks for options first out of band'  
> instead of the
> way you describe.

On Feb 27, 2015, at 15:44, Stefan Beller wrote:
> For both native git as well as ssh, Duy presented a solution at [1, 2]
> a year ago, which essentially presents the desired client capabilites
> 'out of band' to the server via an argument to the server.  So we'd
> only need to examine the http(s) path how to pass in arguments there.

I've looked at those links and it's unclear to me how they support an  
out-of-band option for ssh, they seem to be targeted at git-daemon.   
Maybe there's another reference?

But there does seem to be a way to pass the protocol information out- 
of-band for each of the HTTP, git and ssh connections to stop the  
initial ref advertisement.

As already suggested [1, 2], for git: another "Extended attribute" can  
be added after the host=...\0 to become host=...\0protocol=extended\0  
or similar:

For HTTP, just add a second parameter in the query string .../info/ 
refs?service=git-upload-pack&protocol=extended.  Alternatively an "X- 
Git-Protocol: extended" or similar header can be added by the client.   
It looks to me like the current http-backend.c already ignores any  
extra parameters/headers.

That leaves ssh.  A bit more problematic, but if the server side adds  
"AcceptEnv GIT_PROTOCOL" to its sshd_config and then the client does  
setenv("GIT_PROTOCOL","extended") and adds a "-o SendEnv=GIT_PROTOCOL"  
option to the ssh command line the GIT_PROTOCOL variable will be  
passed along.  This one might need a config option to always disable  
adding the "-o ..." option to the command line in case connect.c  
guesses wrongly about it being OpenSSH and such is not likely to be  
supported other than with OpenSSH on both ends.  I'm not seeing any  
other way to pass out-of-band information to an ssh server configured  
to run git-shell that is safely ignored by current versions of the code.

Worst case with SSH is the initial ref advertisement is not suppressed  
even though the server does support protocol v2 and a capability-based  
sidegrade is required which is unfortunate.

-Kyle

[1] https://github.com/pclouds/git/commit/e26fa77c4d9ace06b9f2c80091af9eb7b63a1c95
[2] https://github.com/pclouds/git/commit/20d048e5fc650b20fdc7dd8bbe35cb8510ac9c50

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags
  2015-02-28  1:01                           ` [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags Stefan Beller
@ 2015-02-28 11:11                             ` Torsten Bögershausen
  2015-03-01  3:22                             ` Junio C Hamano
  1 sibling, 0 replies; 80+ messages in thread
From: Torsten Bögershausen @ 2015-02-28 11:11 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: pclouds, gitster

On 2015-02-28 02.01, Stefan Beller wrote:

> If this is over git protocol, the flags is appended as the next
> parameter after host=. If it's ssh, a new argument is appended to the
> command line.
> 
> None of the callers use this now though.
> 
> [sb: originally by pclouds, rebased as jk implemented 1823bea10,
> (git_connect: use argv_array), so any error is mine]
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  builtin/fetch-pack.c |  2 +-
>  builtin/send-pack.c  |  2 +-
>  connect.c            | 18 ++++++++++++------
>  connect.h            |  2 +-
>  transport.c          |  3 ++-
>  5 files changed, 17 insertions(+), 10 deletions(-)
> 
[]

> diff --git a/connect.c b/connect.c
> index 062e133..7b6b241 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -650,7 +650,9 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
>   * the connection failed).
>   */
>  struct child_process *git_connect(int fd[2], const char *url,
> -				  const char *prog, int flags)
> +				  const char *prog,
> +				  const char *service_flags,
The name "service_flags" reminds me on a service:
We connect to a service, but we are not a service.

And "flags" is often used for different bits, collected in an int, but not a "string"

Options are used at the command line, or arguments.
How about "extra_arg", or "extra_option", or "option_str", or simply "options" (or "option") ? 
> +				  int flags)
>  {
>  	char *hostandport, *path;
>  	struct child_process *conn = &no_fork;
> @@ -685,10 +687,13 @@ struct child_process *git_connect(int fd[2], const char *url,
>  		 * Note: Do not add any other headers here!  Doing so
>  		 * will cause older git-daemon servers to crash.
>  		 */
> -		packet_write(fd[1],
> -			     "%s %s%chost=%s%c",
> -			     prog, path, 0,
> -			     target_host, 0);
> +		if (!service_flags)
> +			packet_write(fd[1], "%s %s%chost=%s%c",
> +				     prog, path, 0, target_host, 0);
> +		else
> +			packet_write(fd[1], "%s %s%chost=%s%c%s%c",
> +				     prog, path, 0, target_host, 0,
> +				     service_flags, 0);
We don't need this "big if" here, a simple "?" will do:
 
			packet_write(fd[1], "%s %s%chost=%s%c%s%c",
				     prog, path, 0, target_host, 0,
				     options ? options : "", /* old service_flasgs */ 
                                     0);

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 2/5] upload-pack: support out of band client capability requests
  2015-02-28  7:47                             ` Kyle J. McKay
@ 2015-02-28 11:22                               ` Duy Nguyen
  2015-02-28 22:36                                 ` Kyle J. McKay
  0 siblings, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-02-28 11:22 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Stefan Beller, Git Mailing List, Junio C Hamano

On Sat, Feb 28, 2015 at 2:47 PM, Kyle J. McKay <mackyle@gmail.com> wrote:
> On Feb 27, 2015, at 17:01, Stefan Beller wrote:
>>
>> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>>
>> The only difference from the original protocol client capabilities are
>> negotiated before initial refs advertisment.
>>
>> Client capabilities are sent out of band (upload-pack receives it as
>> the second command line argument). The server sends one pkt-line back
>> advertising its capabilities.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>
>> Notes:
>>    v1:
>>    I am still undecided if the client should then accept/resend
>>    the capabilities to confirm them, which would make the client the
>>    ultimate decider which capabilities are used.
>>
>>    My gut feeling is to rather let the server make the final decision
>>    for the capabilities, as it will use some requested capabilities
>>    already to not send out all the refs.
>>
>> Documentation/git-upload-pack.txt | 10 +++++++++-
>> upload-pack.c                     | 42
>> +++++++++++++++++++++++++++------------
>> 2 files changed, 38 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/git-upload-pack.txt
>> b/Documentation/git-upload-pack.txt
>> index 0abc806..ad3a89d 100644
>> --- a/Documentation/git-upload-pack.txt
>> +++ b/Documentation/git-upload-pack.txt
>> @@ -9,7 +9,7 @@ git-upload-pack - Send objects packed back to
>> git-fetch-pack
>> SYNOPSIS
>> --------
>> [verse]
>> -'git-upload-pack' [--strict] [--timeout=<n>] <directory>
>> +'git-upload-pack' [--strict] [--timeout=<n>] <directory> [<capabilities>]
>
>
> Isn't the problem with this that passing the extra argument to ssh servers
> will cause them to fail?
>
> Having just looked at the upload-pack.c source it looks to me like trying to
> send "git-upload-pack 'dir' 'capabilities'" to an ssh git server running a
> current version of the code will just end up failing.  I realize the extra
> argument is optional, so does that mean there's no out-of-band support for
> ssh connections since the extra argument would have to be omitted to remain
> compatible?

The client should only trigger this behavior when it knows the server
can deal with it. And that is possible because in the last fetch, the
server has told the client that it's capable of receiving this
capabilities argument. Backward compatibility is a concern at client
side, not server side.

> I've looked at those links and it's unclear to me how they support an
> out-of-band option for ssh, they seem to be targeted at git-daemon.  Maybe
> there's another reference?

For ssh, I think connect.c is the one that constructs and executes ssh command.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 2/5] upload-pack: support out of band client capability requests
  2015-02-28  1:01                           ` [RFC/PATCH 2/5] upload-pack: support out of band client capability requests Stefan Beller
  2015-02-28  7:47                             ` Kyle J. McKay
@ 2015-02-28 11:36                             ` Duy Nguyen
  1 sibling, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-02-28 11:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, Junio C Hamano

On Sat, Feb 28, 2015 at 8:01 AM, Stefan Beller <sbeller@google.com> wrote:
> Notes:
>     v1:
>     I am still undecided if the client should then accept/resend
>     the capabilities to confirm them, which would make the client the
>     ultimate decider which capabilities are used.
>
>     My gut feeling is to rather let the server make the final decision
>     for the capabilities, as it will use some requested capabilities
>     already to not send out all the refs.

pack-capabilities.txt says

"Client will then send a space separated list of capabilities it wants
to be in effect. The client MUST NOT ask for capabilities the server
did not say it supports."

What was sent out of band is what the client can support, not what it
wants. So perhaps drop this patch and let the client decide exactly
what it wants.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 5/5] WIP/Document the http protocol change
  2015-02-28  1:01                           ` [RFC/PATCH 5/5] WIP/Document the http protocol change Stefan Beller
@ 2015-02-28 12:26                             ` Duy Nguyen
  0 siblings, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-02-28 12:26 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, Junio C Hamano

On Sat, Feb 28, 2015 at 8:01 AM, Stefan Beller <sbeller@google.com> wrote:
> diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> index 4f8a7bf..40ddb37 100644
> --- a/Documentation/technical/protocol-capabilities.txt
> +++ b/Documentation/technical/protocol-capabilities.txt
> @@ -3,6 +3,10 @@ Git Protocol Capabilities
>
>  Servers SHOULD support all capabilities defined in this document.
>
> +The client MAY ask for capabilities first out of band to the server.
> +If so the server MUST NOT advertise any capabilities the client did
> +not ask for.

I think what the current patches do is "For v2, the client must _tell_
the server its capabilities". The unwritten statement is "the server
must not use any capabilities not advertised by the client". v2 will
not send initial refs by default, and the client will need to send
"wantrefs refs/heads/* refs/tags/*" or similar to retrieve ref list.
So we don't have any client capability yet. But there are two caps
that come to mind:

 - ref list compression (for sending full ref list in clone),
compression ratio may be 50%
 - setting locale in upload-pack and receive-pack to support l10n.

>  On the very first line of the initial server response of either
>  receive-pack and upload-pack the first reference is followed by
>  a NUL byte and then a list of space delimited server capabilities.

This one should be prefixed with "in version 1, "
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 2/5] upload-pack: support out of band client capability requests
  2015-02-28 11:22                               ` Duy Nguyen
@ 2015-02-28 22:36                                 ` Kyle J. McKay
  2015-03-01  0:11                                   ` Duy Nguyen
  0 siblings, 1 reply; 80+ messages in thread
From: Kyle J. McKay @ 2015-02-28 22:36 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List, Junio C Hamano


On Feb 28, 2015, at 03:22, Duy Nguyen wrote:
> The client should only trigger this behavior when it knows the server
> can deal with it. And that is possible because in the last fetch, the
> server has told the client that it's capable of receiving this
> capabilities argument. Backward compatibility is a concern at client
> side, not server side.
>
>> I've looked at those links and it's unclear to me how they support an
>> out-of-band option for ssh, they seem to be targeted at git- 
>> daemon.  Maybe
>> there's another reference?
>
> For ssh, I think connect.c is the one that constructs and executes  
> ssh command.

This I assume you're referring to this change in connect.c from [1]:

@@ -729,6 +734,8 @@ struct child_process *git_connect(int fd[2], const  
char *url,
                         conn->use_shell = 1;
                 }
                 argv_array_push(&argv, cmd.buf);
+               if (service_flags)
+                       argv_array_push(&argv, service_flags);
                 conn->argv = argv.argv;
                 if (start_command(conn))
                         die("unable to fork");

That's not going to work for ssh servers running a stock git-shell and  
I haven't seen any updates to shell.c to match.  git-shell does not  
allow anything other than one argument to be passed to git-upload-pack/ 
git-receive-pack.

When shell.c calls do_generic_cmd and it calls sq_dequote on its  
argument that contains "'dir' 'service-flags'" it's going to return  
NULL and shell.c will die("bad argument").  So I don't see how this  
supports ssh as-is even if you know in advance the server supports the  
new protocol.  I don't see any changes to shell.c in that uploadpack2  
branch nor in this patch series.

-Kyle

[1] https://github.com/pclouds/git/commit/20d048e5fc650b20fdc7dd8bbe35cb8510ac9c50

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 2/5] upload-pack: support out of band client capability requests
  2015-02-28 22:36                                 ` Kyle J. McKay
@ 2015-03-01  0:11                                   ` Duy Nguyen
  0 siblings, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-01  0:11 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Stefan Beller, Git Mailing List, Junio C Hamano

On Sun, Mar 1, 2015 at 5:36 AM, Kyle J. McKay <mackyle@gmail.com> wrote:
>
> On Feb 28, 2015, at 03:22, Duy Nguyen wrote:
>>
>> The client should only trigger this behavior when it knows the server
>> can deal with it. And that is possible because in the last fetch, the
>> server has told the client that it's capable of receiving this
>> capabilities argument. Backward compatibility is a concern at client
>> side, not server side.
>>
>>> I've looked at those links and it's unclear to me how they support an
>>> out-of-band option for ssh, they seem to be targeted at git-daemon.
>>> Maybe
>>> there's another reference?
>>
>>
>> For ssh, I think connect.c is the one that constructs and executes ssh
>> command.
>
>
> This I assume you're referring to this change in connect.c from [1]:
>
> @@ -729,6 +734,8 @@ struct child_process *git_connect(int fd[2], const char
> *url,
>                         conn->use_shell = 1;
>                 }
>                 argv_array_push(&argv, cmd.buf);
> +               if (service_flags)
> +                       argv_array_push(&argv, service_flags);
>                 conn->argv = argv.argv;
>                 if (start_command(conn))
>                         die("unable to fork");
>
> That's not going to work for ssh servers running a stock git-shell and I
> haven't seen any updates to shell.c to match.  git-shell does not allow
> anything other than one argument to be passed to
> git-upload-pack/git-receive-pack.
>
> When shell.c calls do_generic_cmd and it calls sq_dequote on its argument
> that contains "'dir' 'service-flags'" it's going to return NULL and shell.c
> will die("bad argument").  So I don't see how this supports ssh as-is even
> if you know in advance the server supports the new protocol.  I don't see
> any changes to shell.c in that uploadpack2 branch nor in this patch series.

You're right. If we continue to pass client capabilities as an extra
argument, then git-shell needs updates.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags
  2015-02-28  1:01                           ` [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags Stefan Beller
  2015-02-28 11:11                             ` Torsten Bögershausen
@ 2015-03-01  3:22                             ` Junio C Hamano
  1 sibling, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-01  3:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds

Stefan Beller <sbeller@google.com> writes:

> If this is over git protocol, the flags is appended as the next
> parameter after host=. If it's ssh, a new argument is appended to the
> command line.
>
> None of the callers use this now though.

Replace "some flags" with something more meaningful, so that the
reader can imagine what purpose this change is meant to serve.

I unfortunately cannot offer a good suggestion by guessing what it
is at this point, as none of the callers use this at this step in
the series.

> +		if (!service_flags)
> +			packet_write(fd[1], "%s %s%chost=%s%c",
> +				     prog, path, 0, target_host, 0);
> +		else
> +			packet_write(fd[1], "%s %s%chost=%s%c%s%c",
> +				     prog, path, 0, target_host, 0,
> +				     service_flags, 0);

Is this meant to be consumed by daemon.c:parse_host_arg()?  Doesn't
it detect extra cruft after "host=<hostname>\0" and die with an
"Invalid request"?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 4/5] daemon.c: accept extra service arguments
  2015-02-28  1:01                           ` [RFC/PATCH 4/5] daemon.c: accept extra service arguments Stefan Beller
@ 2015-03-01  3:39                             ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-01  3:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds

Stefan Beller <sbeller@google.com> writes:

> Before 73bb33a (daemon: Strictly parse the "extra arg" part of the
> command - 2009-06-04) a client sending extra arguments could DoS
> git-daemon. 73bb33a fixed it by forbidding extra arguments.
>
> Allow arguments other than "host=" again as a preparation step for
> upload-pack2. "host=" if present must be the first argument
> though. The remaining arguments are concatenated by whitespace.
>
> So far none of supported services support extra arguments. Attempting
> to do will abort the service, just like how it is before. We might
> want to make them silently ignore extra arguments though.

The necessity of this means 3/5, hiding new stuff after
"host=<name>\0", is not a viable strategy to extend the
communication to "git daemon" in a backward compatible way so that
new client can transparently talk with older daemon.

And if you accept that limitation and declare that new client MUST
talk with a new git daemon, then there is not much point in hiding
the extra arguments after "host=<name>\0".  You can just declare
that the new git daemon will accept arguments, including
"host=<name>\0", as a series of "var=<val>\0" in separate packets,
not as a part of the initial "<program> <path>" packet, for example.

One thing 3/5 and 4/5 will give you is to allow old clients to talk
to the new git daemon if the way new programs spawned by git daemon
must use that "hiding" trick, but I am not convinced that it is
necessary to resort to that "hiding" trick in the first place.

Under the same "new client must talk with a new daemon, they cannot
talk with existing daemon" limitation, I think you can arrange
things this way by not touching daemon at all:

 * Keep the daemon protocol as "<program> <path>\0host=<name>\0".

 * New client, as it has to know that it is talking to a host with
   new enough Git, can ask "upload-pack-2" as <program>.

 * Design the protocol that the protocol backend programs such as
   upload-pack-2 (and other updated programs like receive-pack-2,
   trickle-pack, ...) talks in such a way that these "extra"
   parameters sent when client connects are sent as the first
   packet(s).

That way, the new daemon will take "<program> <path>\0host=<name>\0"
and cannot take extra arguments at the end of that packet, but the
new protocol backend programs do not need extra arguments given by
the daemon to them anyway.  Old clients will not ask to be connected
to new upload-pack-2 and friends, but that is to be expected.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-24  6:15   ` Junio C Hamano
  2015-02-24 23:37     ` Stefan Beller
@ 2015-03-01  8:41     ` Junio C Hamano
  2015-03-01 11:32       ` Duy Nguyen
  2015-03-01 23:06       ` Philip Oakley
  1 sibling, 2 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-01  8:41 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Duy Nguyen

I earlier said:

> So if we are going to discuss a new protocol, I'd prefer to see the
> discussion without worrying too much about how to inter-operate
> with the current vintage of Git. It is no longer an interesting problem,
> as we know how to solve it with minimum risk. Instead, I'd like to
> see us design the new protocol in such a way that it is in-line
> upgradable without repeating our past mistakes.

And I am happy to see that people are interested in discussing the
design of new protocols.

But after seeing the patches Stefan sent out, I think we are risking
of losing sight of what we are trying to accomplish.  We do not want
something that is merely new.

That is why I wanted people to think about, discuss and agree on
what limitation of the current protocol has that are problematic
(limitations that are not problematic are not something we do not
need to address [*1*]), so that we can design the new thing without
reintroducing the same limitation.

To remind people, here is a reprint of the draft I sent out earlier
in $gmane/264000.

> The current protocol has the following problems that limit us:
> 
>  - It is not easy to make it resumable, because we recompute every
>    time.  This is especially problematic for the initial fetch aka
>    "clone" as we will be talking about a large transfer [*1*].
> 
>  - The protocol extension has a fairly low length limit [*2*].
> 
>  - Because the protocol exchange starts by the server side
>    advertising all its refs, even when the fetcher is interested in
>    a single ref, the initial overhead is nontrivial, especially when
>    you are doing a small incremental update.  The worst case is an
>    auto-builder that polls every five minutes, even when there is no
>    new commits to be fetched [*3*].
> 
>  - Because we recompute every time, taking into account of what the
>    fetcher has, in addition to what the fetcher obtained earlier
>    from us in order to reduce the transferred bytes, the payload for
>    incremental updates become tailor-made for each fetch and cannot
>    be easily reused [*4*].
> 
> I'd like to see a new protocol that lets us overcome the above
> limitations (did I miss others? I am sure people can help here)
> sometime this year.

Unfortunately, nobody seems to want to help us by responding to "did
I miss others?" RFH, here are a few more from me.

 - The semantics of the side-bands are unclear.

   - Is band #2 meant only for progress output (I think the current
     protocol handlers assume that and unconditionally squelch it
     under --quiet)?  Do we rather want a dedicated "progress" and
     "error message" sidebands instead?

   - Is band #2 meant for human consumption, or do we expect the
     other end to interpret and act on it?  If the former, would it
     make sense to send locale information from the client side and
     ask the server side to produce its output with _("message")?

 - The semantics of packet_flush() is suboptimal, and this
   shortcoming seeps through to the protocol mapped to the
   smart-HTTP transport.

   Originally, packet_flush() was meant as "Here is an end of one
   logical section of what I am going to speak.", hinting that it
   might be a good idea for the underlying implementation to hold
   the packets up to that point in-core and then write(2) them all
   out (i.e. "flush") to the file descriptor only when we handle
   packet_flush().  It never meant "Now I am finished speaking for
   now and it is your turn to speak."

   But because HTTP is inherently a ping-pong protocol where the
   requestor at one point stops talking and lets the responder
   speak, the code to map our protocol to the smart HTTP transport
   made the packet_flush() boundary as "Now I am done talking, it is
   my turn to listen."

   We probably need two kinds of packet_flush().  When a requestor
   needs to say two or more logical groups of things before telling
   the other side "Now I am done talking; it is your turn.", we need
   some marker (i.e. the original meaning of packet_flush()) at the
   end of these logical groups.  And in order to be able to say "Now
   I am done saying everything I need to say at this point for you
   to respond to me.  It is your turn.", we need another kind of
   marker.

[Footnote]

*1* For example, if we were working off of "what mistakes do we want
to correct?" list, I do not think we would have seen "capabilities
have to be only on the first packet" or "lets allow new daemon to
read extra cruft at the end of the first request".  I do not think I
heard why it is a problem that the daemon cannot pass extra info to
invoked program in the first place.  There might be a valid reason,
but then that needs to be explained, understood and agreed upon and
should be part of an updated "what are we fixing?" list.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/5] protocol v2 for upload-pack
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
                                             ` (4 preceding siblings ...)
  2015-02-28  1:01                           ` [RFC/PATCH 5/5] WIP/Document the http protocol change Stefan Beller
@ 2015-03-01  9:11                           ` Johannes Sixt
  2015-03-02  2:58                             ` Junio C Hamano
  5 siblings, 1 reply; 80+ messages in thread
From: Johannes Sixt @ 2015-03-01  9:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, gitster

Am 28.02.2015 um 02:01 schrieb Stefan Beller:
> Heavily inspired by the ideas of Duy, who wrote the first
> patches nearly a year ago.
>
> Nguyễn Thái Ngọc Duy (2):
>    upload-pack: only accept capabilities on the first "want" line
>    upload-pack: support out of band client capability requests
>
> Stefan Beller (3):
>    connect.c: connect to a remote service with some flags
>    daemon.c: accept extra service arguments
>    WIP/Document the http protocol change
>
>   Documentation/git-upload-pack.txt                 | 10 +++++-
>   Documentation/technical/http-protocol.txt         |  4 +--
>   Documentation/technical/protocol-capabilities.txt |  4 +++
>   builtin/fetch-pack.c                              |  2 +-
>   builtin/send-pack.c                               |  2 +-
>   connect.c                                         | 21 ++++++------
>   connect.h                                         |  2 +-
>   daemon.c                                          | 37 +++++++++++++--------
>   transport.c                                       |  3 +-
>   upload-pack.c                                     | 39 +++++++++++++++++------
>   10 files changed, 86 insertions(+), 38 deletions(-)
>

You may also consider an idea I proposed here:

http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207342

The idea is that the exchange begins as usual, but when the v2 client 
sees that the server also supports v2, then it begins sending its 
desired refs. When the server notices that the client spoke while it was 
still sending out its megabytes of ref advertisments, it stops the v1 
advertisements and continues with v2 protocol.

But Shawn pointed out that this would be difficult to implement in a 
JGit server because there is no way to poll the incoming channel while 
it is sending out refs.

-- Hannes

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01  8:41     ` Junio C Hamano
@ 2015-03-01 11:32       ` Duy Nguyen
  2015-03-01 19:56         ` Stefan Beller
  2015-03-01 23:06         ` Junio C Hamano
  2015-03-01 23:06       ` Philip Oakley
  1 sibling, 2 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-01 11:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Stefan Beller

On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>  - Because the protocol exchange starts by the server side
>>    advertising all its refs, even when the fetcher is interested in
>>    a single ref, the initial overhead is nontrivial, especially when
>>    you are doing a small incremental update.  The worst case is an
>>    auto-builder that polls every five minutes, even when there is no
>>    new commits to be fetched [*3*].

Maybe you can elaborate about how to handle states X, Y... in your
footnote 3. I just don't see how it's actually implemented. Or is it
optional feature that will be provided (via hooks, maybe) by admin? Do
we need to worry about load balancers? Is it meant to address the
excessive state transfer due to stateless nature of smart-http?

>> I'd like to see a new protocol that lets us overcome the above
>> limitations (did I miss others? I am sure people can help here)
>> sometime this year.
>
> Unfortunately, nobody seems to want to help us by responding to "did
> I miss others?" RFH, here are a few more from me.

Heh.. I did think about it, but I didn't see anything worth mentioning..

>  - The semantics of the side-bands are unclear.
>
>    - Is band #2 meant only for progress output (I think the current
>      protocol handlers assume that and unconditionally squelch it
>      under --quiet)?  Do we rather want a dedicated "progress" and
>      "error message" sidebands instead?
>
>    - Is band #2 meant for human consumption, or do we expect the
>      other end to interpret and act on it?  If the former, would it
>      make sense to send locale information from the client side and
>      ask the server side to produce its output with _("message")?

No producing _(...) is a bad idea. First the client has to verify
placeholders and stuff, we can't just feed data from server straight
to printf(). Producing _() could complicate server code a lot. And I
don't like the idea of using client .po files to translate server
strings. There could be custom strings added by admin, which are not
available in client .po. String translation should happen at server
side.

If we want error messages to be handled by machine as well, just add a
result code at the beginning, like ftp, http, ... do. Hmm.. this could
be the reason to separate progress and error messages.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01 11:32       ` Duy Nguyen
@ 2015-03-01 19:56         ` Stefan Beller
  2015-03-02  1:05           ` David Lang
  2015-03-01 23:06         ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-03-01 19:56 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List

On Sun, Mar 1, 2015 at 3:32 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>  - Because the protocol exchange starts by the server side
>>>    advertising all its refs, even when the fetcher is interested in
>>>    a single ref, the initial overhead is nontrivial, especially when
>>>    you are doing a small incremental update.  The worst case is an
>>>    auto-builder that polls every five minutes, even when there is no
>>>    new commits to be fetched [*3*].
>
> Maybe you can elaborate about how to handle states X, Y... in your
> footnote 3. I just don't see how it's actually implemented. Or is it
> optional feature that will be provided (via hooks, maybe) by admin? Do
> we need to worry about load balancers? Is it meant to address the
> excessive state transfer due to stateless nature of smart-http?

The way I understand Junio here is to have predefined points which
makes it easier to communicate. There are lots of clients and they usually
want to catch up a different amount of commits, so we need to recompute it
all the time. The idea is then to compute a small pack from the original point
to one of these predefined points.
So a conversion might look like:
Client: My newest commit is dated 2014-11-17.
Server: ok here is a pack from 2014-11-17 until 2014-12-01 and then
I have prepared packs I sent out all the time of 2014-12 and 2015-01
and 2015-02 and then there will be another custom pack for you describing
changes of 2015-02-01 until now.

Mind that I choose dates instead of arbitrary sha1 values as I feel
that explains the
point better, the packs in between are precomputed because many
clients need them.

Personally I don't buy that idea, because it produces a lot of question, like
how large should these packs be? Depending on time or commit counts?

The idea I'd rather favor (I am repeating myself from another post,
but maybe a bit clearer now):

    Client: The last time I asked for "refs/heads/*" and I got a refs
        advertisement hashing to $SHA1
    Server: Ok, here is the diff from that old ref advertisement to the
        current refs advertisement.

I realize that these two ideas are not contradicting each other, but
could rather
help each other as they are orthogonal to each other. One is about
refs advertising
while the other is about object transmission.

>
>>> I'd like to see a new protocol that lets us overcome the above
>>> limitations (did I miss others? I am sure people can help here)
>>> sometime this year.
>>
>> Unfortunately, nobody seems to want to help us by responding to "did
>> I miss others?" RFH, here are a few more from me.
>
> Heh.. I did think about it, but I didn't see anything worth mentioning..
>
>>  - The semantics of the side-bands are unclear.
>>
>>    - Is band #2 meant only for progress output (I think the current
>>      protocol handlers assume that and unconditionally squelch it
>>      under --quiet)?  Do we rather want a dedicated "progress" and
>>      "error message" sidebands instead?
>>
>>    - Is band #2 meant for human consumption, or do we expect the
>>      other end to interpret and act on it?  If the former, would it
>>      make sense to send locale information from the client side and
>>      ask the server side to produce its output with _("message")?
>
> No producing _(...) is a bad idea. First the client has to verify
> placeholders and stuff, we can't just feed data from server straight
> to printf(). Producing _() could complicate server code a lot. And I
> don't like the idea of using client .po files to translate server
> strings. There could be custom strings added by admin, which are not
> available in client .po. String translation should happen at server
> side.
>
> If we want error messages to be handled by machine as well, just add a
> result code at the beginning, like ftp, http, ... do. Hmm.. this could
> be the reason to separate progress and error messages.
> --
> Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01  8:41     ` Junio C Hamano
  2015-03-01 11:32       ` Duy Nguyen
@ 2015-03-01 23:06       ` Philip Oakley
  2015-03-02  9:32         ` Duy Nguyen
  1 sibling, 1 reply; 80+ messages in thread
From: Philip Oakley @ 2015-03-01 23:06 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List; +Cc: Stefan Beller, Duy Nguyen

From: "Junio C Hamano" <gitster@pobox.com>
>I earlier said:
>
>> So if we are going to discuss a new protocol, I'd prefer to see the
>> discussion without worrying too much about how to inter-operate
>> with the current vintage of Git. It is no longer an interesting 
>> problem,
>> as we know how to solve it with minimum risk. Instead, I'd like to
>> see us design the new protocol in such a way that it is in-line
>> upgradable without repeating our past mistakes.
>
> And I am happy to see that people are interested in discussing the
> design of new protocols.
>
> But after seeing the patches Stefan sent out, I think we are risking
> of losing sight of what we are trying to accomplish.  We do not want
> something that is merely new.
>
> That is why I wanted people to think about, discuss and agree on
> what limitation of the current protocol has that are problematic
> (limitations that are not problematic are not something we do not
> need to address [*1*]), so that we can design the new thing without
> reintroducing the same limitation.
>
> To remind people, here is a reprint of the draft I sent out earlier
> in $gmane/264000.
>
>> The current protocol has the following problems that limit us:
>>
>>  - It is not easy to make it resumable, because we recompute every
>>    time.  This is especially problematic for the initial fetch aka
>>    "clone" as we will be talking about a large transfer [*1*].
>>
>>  - The protocol extension has a fairly low length limit [*2*].
>>
>>  - Because the protocol exchange starts by the server side
>>    advertising all its refs, even when the fetcher is interested in
>>    a single ref, the initial overhead is nontrivial, especially when
>>    you are doing a small incremental update.  The worst case is an
>>    auto-builder that polls every five minutes, even when there is no
>>    new commits to be fetched [*3*].
>>
>>  - Because we recompute every time, taking into account of what the
>>    fetcher has, in addition to what the fetcher obtained earlier
>>    from us in order to reduce the transferred bytes, the payload for
>>    incremental updates become tailor-made for each fetch and cannot
>>    be easily reused [*4*].
>>
>> I'd like to see a new protocol that lets us overcome the above
>> limitations (did I miss others? I am sure people can help here)
>> sometime this year.
>
> Unfortunately, nobody seems to want to help us by responding to "did
> I miss others?" RFH, here are a few more from me.

OK, maybe not exactly about protocol, but a possible option would be the 
ability to send the data as a bundle or multi-bundles; Or perhasps as an 
archive, zip, or tar.

Data can then be exchanged across an airgap or pigeon mail. The airgap 
scenario is likely a real case that's not directly prominent at the 
moment, just because it's not tha direct.

There has been discussion about servers having bundles available for 
clones, but with a multi-bundle, one could package up a large bundle 
(months) and an increment (weeks, and then days), before an final easy 
to pack last few hours. That would be a server work trade-off, and 
support a CDN view if needed.

If such an approach was reasonable would the protocol support it? etc.

Just a thought while reading...
>
> - The semantics of the side-bands are unclear.
>
>   - Is band #2 meant only for progress output (I think the current
>     protocol handlers assume that and unconditionally squelch it
>     under --quiet)?  Do we rather want a dedicated "progress" and
>     "error message" sidebands instead?
>
>   - Is band #2 meant for human consumption, or do we expect the
>     other end to interpret and act on it?  If the former, would it
>     make sense to send locale information from the client side and
>     ask the server side to produce its output with _("message")?
>
> - The semantics of packet_flush() is suboptimal, and this
>   shortcoming seeps through to the protocol mapped to the
>   smart-HTTP transport.
>
>   Originally, packet_flush() was meant as "Here is an end of one
>   logical section of what I am going to speak.", hinting that it
>   might be a good idea for the underlying implementation to hold
>   the packets up to that point in-core and then write(2) them all
>   out (i.e. "flush") to the file descriptor only when we handle
>   packet_flush().  It never meant "Now I am finished speaking for
>   now and it is your turn to speak."
>
>   But because HTTP is inherently a ping-pong protocol where the
>   requestor at one point stops talking and lets the responder
>   speak, the code to map our protocol to the smart HTTP transport
>   made the packet_flush() boundary as "Now I am done talking, it is
>   my turn to listen."
>
>   We probably need two kinds of packet_flush().  When a requestor
>   needs to say two or more logical groups of things before telling
>   the other side "Now I am done talking; it is your turn.", we need
>   some marker (i.e. the original meaning of packet_flush()) at the
>   end of these logical groups.  And in order to be able to say "Now
>   I am done saying everything I need to say at this point for you
>   to respond to me.  It is your turn.", we need another kind of
>   marker.
>
>
> [Footnote]
>
> *1* For example, if we were working off of "what mistakes do we want
> to correct?" list, I do not think we would have seen "capabilities
> have to be only on the first packet" or "lets allow new daemon to
> read extra cruft at the end of the first request".  I do not think I
> heard why it is a problem that the daemon cannot pass extra info to
> invoked program in the first place.  There might be a valid reason,
> but then that needs to be explained, understood and agreed upon and
> should be part of an updated "what are we fixing?" list.
>
--
Philip 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01 11:32       ` Duy Nguyen
  2015-03-01 19:56         ` Stefan Beller
@ 2015-03-01 23:06         ` Junio C Hamano
  2015-03-02  1:09           ` David Lang
  1 sibling, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2015-03-01 23:06 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Stefan Beller

Duy Nguyen <pclouds@gmail.com> writes:

> On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>  - Because the protocol exchange starts by the server side
>>>    advertising all its refs, even when the fetcher is interested in
>>>    a single ref, the initial overhead is nontrivial, especially when
>>>    you are doing a small incremental update.  The worst case is an
>>>    auto-builder that polls every five minutes, even when there is no
>>>    new commits to be fetched [*3*].
>
> Maybe you can elaborate about how to handle states X, Y... in your
> footnote 3. I just don't see how it's actually implemented.  Or is
> it optional feature that will be provided (via hooks, maybe) by
> admin?

These footnotes are not the important part (I wanted us to agree on
the problem, and the ideas outlined with the footnotes are only an
example that illustrates how a potential solution and the problem to
be solved are described in relation to each other), but I'll give a
shot anyway ;-)

I am actually torn on how the names X, Y, etc. should be defined.

One side of me wants to leave its computation entirely up to the
server side.  The client says "Last time I talked with you asking
for refs/heads/* and successfully updated from you, you told me to
call that state X" without knowing how X is computed, and then the
server will update you and then tell you your state is now Y.  That
way, large hosting sites and server implementations can choose to
implement it any way they like.

On the other hand, we could rigidly define it, perhaps like this:

 - Imagine that you saved the output from ls-remote that is run
   against that server, limited to the refs hierarchy you are
   requesting, the last time you talked with it.

 - Concatenate the above to the list of patterns the client used to
   ask the refs.  This step is optional.

 - E.g. if you are asking it for "refs/heads/*", then we are talking
   something like this (illustrated with optional pattern in front):

        refs/heads/*
        8004647...      refs/heads/maint
        7f4ba4b...      refs/heads/master

 - Run SHA-1 hash over that.  And that is the state name.

I.e. if you as a client are doing

    [remote "origin"]
        fetch = refs/heads/*:refs/remotes/origin/*

and if the only time your refs/remotes/origin/* hierarchy changes is
when you fetch from there (which should be the norm), you can look
into remote.origin.fetch refspec (to learn that "refs/heads*" is
what you are asking) and your refs/remotes/origin/* refs (and
reverse the mapping you make when you fetch to make them talk about
refs/heads/* hierarchy on the server side), you can compute it
locally.

The latter will have one benefit over "opaque thing the client does
not know how to compute".  Because I want us avoid sending unchanged
refs over connection, but I do want to see the protocol has some
validation mechanism built in, even if we go the latter "client can
compute what the state name ought to be" route, I want the servrer
to tell the client what to call that state.  That way, the client
side can tell when it goes out of sync for any reason and attempt to
recover.

> Do we need to worry about load balancers?

Unless you are allowing multiple backend servers to serve a same
repository behind a set of load balancers in an inconsistent way
(e.g. you push to one while I push to two and you fetch from one and
you temporarily see my push but then my push will be rejected as
conflicting and you fetch from one and now you see your push), I do
not think there is anything you need to worry about them more than
what you should be worrying about already.

There would be a point where all backend servers would agree "This
is the set of values of these refs" at some point (e.g. a majority
of surviving servers vote to decide, laggers that later join the
party will update to the concensus value before serving the end-user
traffic), and they would not be showing half-updated values that
haven't ratified by other servers to end users (otherwise they may
end up showing reversion).

>>    - Is band #2 meant for human consumption, or do we expect the
>>      other end to interpret and act on it?  If the former, would it
>>      make sense to send locale information from the client side and
>>      ask the server side to produce its output with _("message")?
>
> No producing _(...) is a bad idea. First the client has to verify
> placeholders and stuff, we can't just feed data from server straight
> to printf(). Producing _() could complicate server code a lot. And I
> don't like the idea of using client .po files to translate server
> strings. There could be custom strings added by admin, which are not
> available in client .po. String translation should happen at server
> side.

What I meant to say was (1) the client says "I want the human
readable message in Vietnamese" and (2) the server uses .po on
_("message") in its code and send the result to sideband #2.  There
is no parsing, interpolation, or anything of that sort necessary on
the server end.

Because potential problem you mention looks to me about a different
design where the server talks "message" and client side applies
_($msg) to it, I suspect that you misread what I meant to say.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01 19:56         ` Stefan Beller
@ 2015-03-02  1:05           ` David Lang
  0 siblings, 0 replies; 80+ messages in thread
From: David Lang @ 2015-03-02  1:05 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, Junio C Hamano, Git Mailing List

On Sun, 1 Mar 2015, Stefan Beller wrote:

> The way I understand Junio here is to have predefined points which
> makes it easier to communicate. There are lots of clients and they usually
> want to catch up a different amount of commits, so we need to recompute it
> all the time. The idea is then to compute a small pack from the original point
> to one of these predefined points.
> So a conversion might look like:
> Client: My newest commit is dated 2014-11-17.
> Server: ok here is a pack from 2014-11-17 until 2014-12-01 and then
> I have prepared packs I sent out all the time of 2014-12 and 2015-01
> and 2015-02 and then there will be another custom pack for you describing
> changes of 2015-02-01 until now.
>
> Mind that I choose dates instead of arbitrary sha1 values as I feel
> that explains the
> point better, the packs in between are precomputed because many
> clients need them.
>
> Personally I don't buy that idea, because it produces a lot of question, like
> how large should these packs be? Depending on time or commit counts?

I think this is going to depend on the project in question. I think that doing 
this based on public tags makes lots of sense. The precomputed packs should also 
change over time.

For example, with the linux kernel, as each -rc is released, there will be a lot 
of people wanting to upgrade from a prior -rc, so having a pack for each of 
these is probably worthwhile. You probably also want a precomputed pack to move 
from some of the -rc releases to the final release. And then a single pack to 
move from the prior final release to the newest one. There may also me a resons 
to make a pack that jumps several releases to go from one LTS kernel to the 
next.

Exactly what precomputed packs make sense, and how large the packs should be is 
going to be _very_ dependent on the update patterns of users. The only people 
who can decide exactly what packs they should use are the admins of the systems, 
and should be based on their logs of what requests are being made. I can see the 
git project creating scripts to analyze the logs of client connections to make 
recommendations on what packs would be useful to have pre-generated, ideally 
ordered by how much computation they would save (and the amount of disk space 
required to hold the packs, and then the admin of the site can indicate where 
they want the cutoff to be. Some extremely busy sites may have a lot of disk 
space compared to CPU and be willing to have lots of packs around, others are 
less busy and will only want to keep a few around.

David Lang

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01 23:06         ` Junio C Hamano
@ 2015-03-02  1:09           ` David Lang
  2015-03-02  3:10             ` Junio C Hamano
  0 siblings, 1 reply; 80+ messages in thread
From: David Lang @ 2015-03-02  1:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List, Stefan Beller

On Sun, 1 Mar 2015, Junio C Hamano wrote:

> and if the only time your refs/remotes/origin/* hierarchy changes is
> when you fetch from there (which should be the norm), you can look
> into remote.origin.fetch refspec (to learn that "refs/heads*" is
> what you are asking) and your refs/remotes/origin/* refs (and
> reverse the mapping you make when you fetch to make them talk about
> refs/heads/* hierarchy on the server side), you can compute it
> locally.
>
> The latter will have one benefit over "opaque thing the client does
> not know how to compute".  Because I want us avoid sending unchanged
> refs over connection, but I do want to see the protocol has some
> validation mechanism built in, even if we go the latter "client can
> compute what the state name ought to be" route, I want the servrer
> to tell the client what to call that state.  That way, the client
> side can tell when it goes out of sync for any reason and attempt to
> recover.

how would these approaches be affected by a client that is pulling from 
different remotes into one local repository? For example, pulling from the main 
kernel repo and from the -stable repo.

David Lang

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/5] protocol v2 for upload-pack
  2015-03-01  9:11                           ` [RFC/PATCH 0/5] protocol v2 for upload-pack Johannes Sixt
@ 2015-03-02  2:58                             ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-02  2:58 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Stefan Beller, git, pclouds

Johannes Sixt <j6t@kdbg.org> writes:

> You may also consider an idea I proposed here:
>
> http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207342
>
> The idea is that the exchange begins as usual, but when the v2 client
> sees that the server also supports v2, then it begins sending its
> desired refs. When the server notices that the client spoke while it
> was still sending out its megabytes of ref advertisments, it stops the
> v1 advertisements and continues with v2 protocol.

Yes, it would be possible to design the protocol in such a way
provided if we can find a v1 protocol message that we can use so
that a new client will send it as "I am a client capable of talking
the new protocol" and the message appears as a no-op to older
servers.  And that would allow us to transparently upgrade the
protocol in-line.  That would be a big plus.

And there seems to be such a message for both fetch and push
protoocol, as discussed in $gmane/264512 a few days ago.

But I think the current thinking is that it is OK to avoid the
ugliness and complexity of having to say "oops, sorry but I did not
hear your 'I am new enough' soon enough and sent a lot of v1
advertisement, let's resync with v2 protocol" and instead upgrade
the old client upon the _next_ exchange (see $gmane/264309).

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-02  1:09           ` David Lang
@ 2015-03-02  3:10             ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-02  3:10 UTC (permalink / raw)
  To: David Lang; +Cc: Duy Nguyen, Git Mailing List, Stefan Beller

David Lang <david@lang.hm> writes:

> how would these approaches be affected by a client that is pulling
> from different remotes into one local repository? For example, pulling
> from the main kernel repo and from the -stable repo.
>
> David Lang

As I said in $gmane/264000, which the above came from:

    Note that the above would work if and only if we accept that it
    is OK to send objects between the remote tracking branches the
    fetcher has (i.e. the objects it last fetched from the server)
    and the current tips of branches the server has, without
    optimizing by taking into account that some commits in that set
    may have already been obtained by the fetcher from a
    third-party.

The scheme tries go gain by reducing the ref advertisement cost by
sacrificing the optimization opportunity when you fetch from updated
Linus's tree after having fetched from a recent next tree, the
latter of which may have contained a lot of objects that went to the
Linus's tree since you fetched from Linus's the last time.  The
current protocol, by negotiating what you have (including the
objects you obtained from sideways via 'next') with the Linus's
tree, allows the server to compute a minimum packfile customized
just for you.  By trading that off with "everybody that follow this
repository will get the same set of packfiles in sequence trickled
into his repository" model, it would instead allow the server to
prepare these packfiles thousands of clients that follow Linus's
tree will want only once.

The client-server pair may want to have a negociation mechanism
(e.g. "I may have many objects I fetched from sideways, give me
minimum pack that is customized for me by spending cycles---I am
willing to wait until you finish computing it" vs "I am just
following along and not doing anything fancy, just give me the same
thing as everybody else") to select what optimization they want to
use.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-02-28  0:46                       ` Stefan Beller
  2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
@ 2015-03-02  3:47                         ` Junio C Hamano
  2015-03-02  9:21                           ` Duy Nguyen
  1 sibling, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2015-03-02  3:47 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, Git Mailing List

Stefan Beller <sbeller@google.com> writes:

> A race condition may be a serious objection then? Once people believe the
> refs can scale fairly well they will use it, which means blasting the ref
> advertisement will become very worse over time.

I think we are already in agreement about that case:

    A misdetected case between (new client, new server) pair might go
    like this:

        - new client connects and sends that no-op.

        - new server accepts the connection, but that no-op probe has
          not arrived yet..  It misdetects the other side as a v1
          client and it starts blasting the ref advertisement.

        - new client notices that the ref advertisement has the
          capability bit and the server is capable of v2 protocol.  it
          waits until the server sends "sorry, I misdetected" message.

        - new server eventually notices the "no-op probe" while blasting
          the ref advertisement and it can stop in the middle.
          hopefully this can happen after only sending a few kilobytes
          among megabytes of ref advertisement data ;-).  The server
          sends "sorry, I misdetected" message to synchronise.

        - both sides happily speak v2 from here on.

However, I do not think it needs to become worse over time, because
we can change and adjust as the user population and their use
patterns evolve.  For example, you can introduce a small delay
before the new versions of server starts the v1 advertisement, and
make that delay longer and longer over time, as the population of
v1-only clients go down, for example.

Difficulty (see J6t's comment) in other implementations may be a
more important roadblocks.  It seems, however, that our current
thinking is that it is OK to do the "allow new v1 clients to notice
the availabilty of v2 servers, so that they can talk v2 the next
time" thing, so my preference is to throw this "client first and let
server notice" into "maybe doable but not our first choice" bin, at
least for now.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-02  3:47                         ` [RFC/PATCH 0/3] protocol v2 Junio C Hamano
@ 2015-03-02  9:21                           ` Duy Nguyen
  2015-03-02  9:24                             ` Duy Nguyen
                                               ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-02  9:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Git Mailing List

On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote:
> It seems, however, that our current thinking is that it is OK to do
> the "allow new v1 clients to notice the availabilty of v2 servers,
> so that they can talk v2 the next time" thing, so my preference is
> to throw this "client first and let server notice" into "maybe
> doable but not our first choice" bin, at least for now.

OK let's see if first choice like this could work. Very draft but it
should give some idea how to make a prototype to test it out. Note
that the server still speaks first in this proposal.

-- 8< --
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 462e206..32a1186 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -1,11 +1,11 @@
 Packfile transfer protocols
 ===========================
 
-Git supports transferring data in packfiles over the ssh://, git:// and
+Git supports transferring data in packfiles over the ssh://, git://, http:// and
 file:// transports.  There exist two sets of protocols, one for pushing
 data from a client to a server and another for fetching data from a
 server to a client.  All three transports (ssh, git, file) use the same
-protocol to transfer data.
+protocol to transfer data. http is documented in http-protocol.txt.
 
 The processes invoked in the canonical Git implementation are 'upload-pack'
 on the server side and 'fetch-pack' on the client side for fetching data;
@@ -14,6 +14,12 @@ data.  The protocol functions to have a server tell a client what is
 currently on the server, then for the two to negotiate the smallest amount
 of data to send in order to fully update one or the other.
 
+"upload-pack-2" and "receive-pack-2" are the next generation of
+"upload-pack" and "receive-pack" respectively. The first two are
+referred as "version 2" in this document and pack-capabilities.txt
+while the last two are "version 1". Unless stated otherwise, "version 1"
+is implied.
+
 Transports
 ----------
 There are three transports over which the packfile protocol is
@@ -42,7 +48,8 @@ hostname parameter, terminated by a NUL byte.
 
 --
    git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
-   request-command   = "git-upload-pack" / "git-receive-pack" /
+   request-command   = "git-upload-pack" / "git-upload-pack-2" /
+		       "git-receive-pack" / "git-receive-pack-2" /
 		       "git-upload-archive"   ; case sensitive
    pathname          = *( %x01-ff ) ; exclude NUL
    host-parameter    = "host=" hostname [ ":" port ]
@@ -67,7 +74,6 @@ gracefully with an error message.
   error-line     =  PKT-LINE("ERR" SP explanation-text)
 ----
 
-
 SSH Transport
 -------------
 
@@ -124,9 +130,58 @@ has, the first can 'fetch' from the second.  This operation determines
 what data the server has that the client does not then streams that
 data down to the client in packfile format.
 
+Capability discovery (v2)
+-------------------------
 
-Reference Discovery
--------------------
+In version 1, capability discovery is part of reference discovery and
+covered in reference discovery section.
+
+In versino 2, when the client initially connects, the server
+immediately sends its capabilities to the client. Then the client must
+send the list of server capabilities it wants to use to the server.
+
+   S: 00XXcapabilities multi_ack thin-pack ofs-delta lang\n
+   C: 00XXcapabilities thin-pack ofs-delta lang=en\n
+
+----
+  cap              =  PKT-LINE("capabilities" SP capability-list LF)
+  capability-list  =  capability *(SP capability)
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
+  LC_ALPHA         =  %x61-7A
+----
+
+The client MUST NOT ask for capabilities the server did not say it
+supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent.  Server MUST NOT ignore capabilities that client requested
+and server advertised.  As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
+See protocol-capabilities.txt for a list of allowed server and client
+capabilities and descriptions.
+
+XXX: this approach wastes one round trip in smart-http because the
+client would speak first. Perhaps we could allow client speculation.
+It can assume what server caps will send and send commands based on that
+assumption. If it turns out true, we save one round trip. E.g. fast
+path:
+
+   C: You are supposed to send caps A, B. I would respond with cap B.
+      Then I would send "want-refs refs/heads/foo".
+   S: (yes we are sending caps A and B), validate client caps,
+      execute "want-refs" and return ref list
+
+and slow path:
+
+   C: You are supposed to send caps A, B. I would respond with cap B.
+      Then I would send "want-refs refs/heads/foo".
+   S: Send caps A, B and C. ignore the rest from client
+   C: Want caps A and C. Send "want-refs foo"
+   S: return ref foo
+
+Reference Discovery (v1)
+------------------------
 
 When the client initially connects the server will immediately respond
 with a listing of each reference it has (all branches and tags) along
@@ -178,16 +233,69 @@ MUST peel the ref if it's an annotated tag.
   shallow          =  PKT-LINE("shallow" SP obj-id)
 
   capability-list  =  capability *(SP capability)
-  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
   LC_ALPHA         =  %x61-7A
 ----
 
 Server and client MUST use lowercase for obj-id, both MUST treat obj-id
 as case-insensitive.
 
+On the very first line of the initial server response of either
+receive-pack and upload-pack the first reference is followed by
+a NUL byte and then a list of space delimited server capabilities.
+These allow the server to declare what it can and cannot support
+to the client.
+
+Client will then send a space separated list of capabilities it wants
+to be in effect. The client MUST NOT ask for capabilities the server
+did not say it supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent.  Server MUST NOT ignore capabilities that client requested
+and server advertised.  As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
 See protocol-capabilities.txt for a list of allowed server capabilities
 and descriptions.
 
+Reference Discovery (v2)
+------------------------
+
+In version 2, reference discovery is initiated by the client with
+"want-refs" line. The client may skip reference discovery phase
+entirely by not sending "want-refs" (e.g. the client has another way
+to retrieve ref list).
+
+----
+  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
+  mode       =  "all"
+  argument   =  SP arg
+  arg        =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
+----
+
+Mode "all" sends all visible refs to the client like in version 1. No
+arguments are accepted in this mode. More modes may be available based
+on capabilities.
+
+----
+  advertised-refs  =  (no-refs / list-of-refs)
+		      *shallow
+		      flush-pkt
+
+  no-refs          =  PKT-LINE(zero-id LF)
+
+  list-of-refs     =  *other-ref
+  other-ref        =  PKT-LINE(other-tip / other-peeled)
+  other-tip        =  obj-id SP refname LF
+  other-peeled     =  obj-id SP refname "^{}" LF
+
+  shallow          =  PKT-LINE("shallow" SP obj-id)
+
+  capability-list  =  capability *(SP capability)
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
+  LC_ALPHA         =  %x61-7A
+----
+
 Packfile Negotiation
 --------------------
 After reference and capabilities discovery, the client can decide to
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 4f8a7bf..ecb0efd 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -3,21 +3,6 @@ Git Protocol Capabilities
 
 Servers SHOULD support all capabilities defined in this document.
 
-On the very first line of the initial server response of either
-receive-pack and upload-pack the first reference is followed by
-a NUL byte and then a list of space delimited server capabilities.
-These allow the server to declare what it can and cannot support
-to the client.
-
-Client will then send a space separated list of capabilities it wants
-to be in effect. The client MUST NOT ask for capabilities the server
-did not say it supports.
-
-Server MUST diagnose and abort if capabilities it does not understand
-was sent.  Server MUST NOT ignore capabilities that client requested
-and server advertised.  As a consequence of these rules, server MUST
-NOT advertise capabilities it does not understand.
-
 The 'atomic', 'report-status', 'delete-refs', 'quiet', and 'push-cert'
 capabilities are sent and recognized by the receive-pack (push to server)
 process.
@@ -268,3 +253,10 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+v2
+--
+
+'git-upload-pack' and 'git-receive-pack' may advertise this capability
+if the server supports 'git-upload-pack-2' and 'git-receive-pack-2'
+respectively.
-- 8< --

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-02  9:21                           ` Duy Nguyen
@ 2015-03-02  9:24                             ` Duy Nguyen
  2015-03-03 10:33                             ` Duy Nguyen
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
  2 siblings, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-02  9:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Git Mailing List

On Mon, Mar 02, 2015 at 04:21:36PM +0700, Duy Nguyen wrote:
> On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote:
> > It seems, however, that our current thinking is that it is OK to do
> > the "allow new v1 clients to notice the availabilty of v2 servers,
> > so that they can talk v2 the next time" thing, so my preference is
> > to throw this "client first and let server notice" into "maybe
> > doable but not our first choice" bin, at least for now.
> 
> OK let's see if first choice like this could work. Very draft but it
> should give some idea how to make a prototype to test it out. Note
> that the server still speaks first in this proposal.

And ref discovery phase could be modified by new capabilities. For
example,

-- 8< --
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 56c11b4..56a8c2e 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -304,3 +304,36 @@ language code.
 
 The default language code is unspecified, even though it's usually
 English in ASCII encoding.
+
+compressed-refs
+---------------
+
+This is applicable to upload-pack-2 and receive-pack-2 only. The
+client expects ref list in reference discovery phase to be sent in
+compressed format:
+
+ - Each PKT-LINE may contain more than one ref
+ - SHA-1 is in binary encoding (i.e. 20 bytes instead of
+   40 bytes as hex string)
+ - ref name is prefix compressed, see index-format.txt version 4.
+ - Ref list ends with flush-pkt
+
+glob-refs
+---------
+
+This is applicable to upload-pack-2 and receive-pack-2 only. In
+reference discovery phase, a new mode "glob" is supported. Where the
+arguments are wildmatch patterns. Negative patterns begin with '!'.
+Only refs matching requested patterns are sent to the client.
+
+stateful-refs
+-------------
+
+This is applicable to upload-pack-2 and receive-pack-2 only. In
+reference discovery phase, a new mode "stateful" is supported. Where
+the first argument is a string representing the ref list that was sent
+by the same server last time. The remaining arguments are glob.
+
+The first ref line that the server sends should carry a new state
+string after ref name. The server may send only updated refs it if
+understands the state string sent by the client. Still under discussion.
-- 8< --

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-01 23:06       ` Philip Oakley
@ 2015-03-02  9:32         ` Duy Nguyen
  0 siblings, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-02  9:32 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Junio C Hamano, Git Mailing List, Stefan Beller

On Sun, Mar 01, 2015 at 11:06:21PM -0000, Philip Oakley wrote:
> OK, maybe not exactly about protocol, but a possible option would be the 
> ability to send the data as a bundle or multi-bundles; Or perhasps as an 
> archive, zip, or tar.
> 
> Data can then be exchanged across an airgap or pigeon mail. The airgap 
> scenario is likely a real case that's not directly prominent at the 
> moment, just because it's not tha direct.
> 
> There has been discussion about servers having bundles available for 
> clones, but with a multi-bundle, one could package up a large bundle 
> (months) and an increment (weeks, and then days), before an final easy 
> to pack last few hours. That would be a server work trade-off, and 
> support a CDN view if needed.
> 
> If such an approach was reasonable would the protocol support it? etc.

It came up several times. Many people are in favor of it. Some
references..

    http://thread.gmane.org/gmane.comp.version-control.git/264305/focus=264565
    http://thread.gmane.org/gmane.comp.version-control.git/263898/focus=263928
    http://thread.gmane.org/gmane.comp.version-control.git/263898/focus=264000
    http://thread.gmane.org/gmane.comp.version-control.git/238472/focus=238844

This is what I got so far. I think the hard part is how to let
projects control this in a clean and flexible way. Not written in the
patch, but I'm thinking maybe we can allow hooking a remote helper in
standard git://, ssh://, http://... That would give total control to
projects.

-- 8< --
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index ecb0efd..2b99464 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -260,3 +260,34 @@ v2
 'git-upload-pack' and 'git-receive-pack' may advertise this capability
 if the server supports 'git-upload-pack-2' and 'git-receive-pack-2'
 respectively.
+
+redirect
+--------
+
+This capability is applicable for upload-pack and upload-pack-v2
+only. When the client requests this capability it must specify
+supported transport protocol separated by colon,
+e.g. "redirect=http:ftp:ssh:torrent".
+
+Instead of sending a packfile data to the client, the server may send
+4-byte signature { 'L', 'I', 'N', 'K' } followed by a NUL-terminated
+URLs, each one pointing to a bundle. This fake pack ends with an empty
+string.
+
+The bundle does not have to contain all refs requested by the
+client. Different bundles from different URLs could have different
+content. The client must follow one of the links to get a bundle.
+The server must not send URL in a protocol that the client does not
+support.
+
+FIXME: do we keep current connection alive until the bundle is
+downloaded and get a normal pack, or let the client initiate a new
+connection? Or perhaps if the client fails to get the bundle for
+whatever reason, it could send "NAK" to the server and the server
+sends normal packfile data.
+
+FIXME: how do we implement this exactly? The decision to redirect
+should probably be delegated to some hook. Maybe sending all "want"
+lines to the script is enough.. Sending "have" lines is more difficult
+because the server decides when to stop receiving them. That decision
+must be moved to the hook...
-- 8< --

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-02  9:21                           ` Duy Nguyen
  2015-03-02  9:24                             ` Duy Nguyen
@ 2015-03-03 10:33                             ` Duy Nguyen
  2015-03-03 17:13                               ` Junio C Hamano
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
  2 siblings, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-03-03 10:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Git Mailing List

On Mon, Mar 02, 2015 at 04:21:36PM +0700, Duy Nguyen wrote:
> On Sun, Mar 01, 2015 at 07:47:40PM -0800, Junio C Hamano wrote:
> > It seems, however, that our current thinking is that it is OK to do
> > the "allow new v1 clients to notice the availabilty of v2 servers,
> > so that they can talk v2 the next time" thing, so my preference is
> > to throw this "client first and let server notice" into "maybe
> > doable but not our first choice" bin, at least for now.
> 
> OK let's see if first choice like this could work. Very draft but it
> should give some idea how to make a prototype to test it out. Note
> that the server still speaks first in this proposal.

Junio pointed out in private that I didn't address the packet length
limit (64k). I thought I could get away with a new capability
(i.e. not worry about it now) but I finally admit that was a bad
hack. So perhaps this on top.

The "WARN" line was originally supposed to be used when packet length
is still 64k and the server has a ref longer than that. It could then
skip that long ref and inform the user, so the user could re-request
again, this time asking for "long packet length" capability.

That's irrelevant now. But I think an option to say something without
abort may still be a good idea, especially if we allow hooks to
intercept the protocol.

-- 8< --
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 32a1186..e2003c0 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -37,6 +37,20 @@ communicates with that invoked process over the SSH connection.
 The file:// transport runs the 'upload-pack' or 'receive-pack'
 process locally and communicates with it over a pipe.
 
+Pkt-line format
+---------------
+
+In version 1, a packet line consists of four bytes containing the
+length of the entire line plus four, in hexadecimal format. A flush
+consists of four zero bytes.
+
+In version 2, the four-byte header format remains supported but the
+maximum length is 0xfffe. If the length is 0xffff, the actual length
+follows in variable encoding in hexadecimal.
+
+XXX: perhaps go with 2-byte length by default instead because we don't
+usually need pkt-line longer than 256?? Maybe not worth saving a couple bytes
+
 Git Transport
 -------------
 
@@ -68,10 +82,12 @@ process on the server side over the Git protocol is this:
      nc -v example.com 9418
 
 If the server refuses the request for some reasons, it could abort
-gracefully with an error message.
+gracefully with an error message, or show a warning but and keep
+moving.
 
 ----
   error-line     =  PKT-LINE("ERR" SP explanation-text)
+  warning-line   =  PKT-LINE("WARN" SP explanation-text)
 ----
 
 SSH Transport
-- 8< --

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-03 10:33                             ` Duy Nguyen
@ 2015-03-03 17:13                               ` Junio C Hamano
  2015-03-03 19:47                                 ` Junio C Hamano
                                                   ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-03 17:13 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> Junio pointed out in private that I didn't address the packet length
> limit (64k). I thought I could get away with a new capability
> (i.e. not worry about it now) but I finally admit that was a bad
> hack. So perhaps this on top.

No, I didn't ;-) but I tend to agree that "perhaps 4GB huge packet?"
is a bad idea.

The problem I had with the version in your write-up was that it
still assumed that all capabilities must come on one packet-line.

The immediate issue that limitation in the current protocol we had
was that the usual "we can help newer programs to operate better
while getting ignored by existing programs by sending optional
information as part of the capability advert" would not work for
"upload-pack" to enumerate symrefs and their targets to help
"clone".

The lesson to draw from that experience is not "we should have an
option to use large packets".  64kB is plenty but the senders and
the receivers have a lot lower limit in practice to avoid harming
latency (I think it is like 1000 bytes before both ends agree to
switch talking over the sideband multiplexer).  It is not "we should
anticipate and design protocol better", either.  We are humans and
it is hard to predict things, especially things in the future.

The lesson we should learn is that it is important to leave us
enough wiggle room to allow us cope with such unanticipated
limitations ;-).

My recollection is that the consensus from the last time we
discussed protocol revamping was to list one capability per packet
so that packet length limit does not matter, but you may want to
check with the list archive yourself.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-03 17:13                               ` Junio C Hamano
@ 2015-03-03 19:47                                 ` Junio C Hamano
  2015-03-04  1:54                                 ` Duy Nguyen
  2015-03-24 17:42                                 ` Stefan Beller
  2 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-03 19:47 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> Junio pointed out in private that I didn't address the packet length
>> limit (64k). I thought I could get away with a new capability
>> (i.e. not worry about it now) but I finally admit that was a bad
>> hack. So perhaps this on top.
>
> No, I didn't ;-) but I tend to agree that "perhaps 4GB huge packet?"
> is a bad idea.
> ...

I realize that I responded with "No, I did not complain about X, I
had trouble about Y and here is why" and talked mostly about Y
without talking much about X.  So let's touch X a bit.

As to the packet length, I think it is a good idea to give us an
escape hatch to bust 64k limit.  Refs may not be the reason to do
so, but as I said, we cannot forsee the future needs.

Having X behind us, now back to Y, and then I'll remind us of Z ;-)
[*1*]

> My recollection is that the consensus from the last time we
> discussed protocol revamping was to list one capability per packet
> ...

And the above is "the right thing" from the protocol point of view.
The only reason the current protocol says "capabilities go on a
single line separated by SP" is because the hole we found to add to
the protocol was to piggyback after the ref advertisement lines, and
there was no guarantee that we have more than one ref advertised, so
we needed to be able to stuff everything on a single line.

Stepping back and thinking about what a "packet" in pkt-line
protocol is, we realize that it is the smallest logical unit of
transferring information.  The state of a single ref in a series of
ref advertisement.  The fact that receiving end has all the history
leading up to a single commit.  The request to obtain all history
leading up to a single commit.

That is why I say that one-cap-per-packet is the right thing.

These individual logical units are grouped into a larger logical
unit by (1) being at a specific point in the protocol exchange, (2)
being adjacent to each other and (3) terminated by a "flush
packet".  Examples:

 - A bunch of individual ref states that is at the beginning of the
   upload-pack to fetch-pack commniucation that ends with a flush
   constitutes a ref advertisement.  

 - A series of "want" packets at the beginning of the fetch-pack to
   upload-pack communiucation that ends with a flush constitutes a
   fetch request.

Another thing I didn't find in the updated documentation was a
proposal to define what a "flush" exactly means.  

In my above writing, it should be clear that a "flush" is merely
"the end of a group".  It does not mean (and it never meant, until
smart HTTP) "I am finished talking, now it is your turn."  If a
requestor needs to give two groups of items before the responder can
process the request, we would want to be able to say "A1, A2, ...,
now I am done with As; B1, B2, B3, ..., now I am done with Bs; this
concludes my request, and it is your turn to process and respond to
me".  But you cannot easily do so without affecting smart HTTP, as
it is written in such a way that it assumes "flush" is "I am done,
it is your turn".

I am perfectly OK if v2 redefined "flush" to mean "I am done, it is
yoru turn."  But then the protocol should have another way to allow
packets into larger groups.  A sequence of packets "begin A", "A1",
"A2", ..., "end", "begin B", "B1", "B2", "B3", "end", "flush" may be
a way to do so, and if we continue to rely on the order of packets
to help determine the semantics (aka "being at a specific point in
the protocol exchange" above), we may even be able to omit "begin A"
and "begin B" packets (i.e. the "end" is the new "end of a logical
group", which is what "flush" originally was).

[Footnote]

 *1* For those who haven't been following the discussion:

     X: maximum packet length being 64kB might be problematic.

     Y: requiring capability advertisement and request in a single
        packet is wrong.

     Z: the meaning of "flush" needs to be clarified.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-03 17:13                               ` Junio C Hamano
  2015-03-03 19:47                                 ` Junio C Hamano
@ 2015-03-04  1:54                                 ` Duy Nguyen
  2015-03-04  4:27                                   ` Shawn Pearce
  2015-03-24 17:42                                 ` Stefan Beller
  2 siblings, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-03-04  1:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Git Mailing List

On Wed, Mar 4, 2015 at 12:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
> My recollection is that the consensus from the last time we
> discussed protocol revamping was to list one capability per packet
> so that packet length limit does not matter, but you may want to
> check with the list archive yourself.

I couldn't find that consensus mail, but this one [1] is good enough
evidence that we can hit packet length limit in capability line
easily.

With an escape hatch to allow maximum packet length up to  uint_max, I
think we'll be fine for a long time even if we don't send one cap per
pkt-line. So I'm trying to see if we really want to go with one cap
per pkt-line..

Pros:

 - better memory management, current pkt-line static buffer is probably fine
 - a capability can contain spaces after '='

Cons:

 - some refactoring needed to hide away differences between v1 and v2

Looks like one cap per pkt-line is winning..

[1] http://thread.gmane.org/gmane.comp.version-control.git/237929
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-04  1:54                                 ` Duy Nguyen
@ 2015-03-04  4:27                                   ` Shawn Pearce
  2015-03-04 12:05                                     ` Duy Nguyen
  0 siblings, 1 reply; 80+ messages in thread
From: Shawn Pearce @ 2015-03-04  4:27 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Stefan Beller, Git Mailing List

On Tue, Mar 3, 2015 at 5:54 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Wed, Mar 4, 2015 at 12:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> My recollection is that the consensus from the last time we
>> discussed protocol revamping was to list one capability per packet
>> so that packet length limit does not matter, but you may want to
>> check with the list archive yourself.
>
> I couldn't find that consensus mail, but this one [1] is good enough
> evidence that we can hit packet length limit in capability line
> easily.
> With an escape hatch to allow maximum packet length up to  uint_max, I

The symbolic ref thing was done badly. There isn't an escape hatch in
current v1 protocol sufficient to allow this but each ref should be
its own pkt-line, or should be a small batch of refs per pkt-line, or
the ref advertisement should be a data stream in a side-band-64k sort
of format inside the pkt-line framing.

At 64k per frame of side-band there is plenty of data to header ratio
that we don't need  to escape to uint_max.

> Looks like one cap per pkt-line is winning..

Yes.

> [1] http://thread.gmane.org/gmane.comp.version-control.git/237929

Let me go on a different tangent a bit from the current protocol.

http://www.grpc.io/ was recently released and is built on the HTTP/2
standard. It uses protobuf as a proven extensibility mechanism.
Including a full C based grpc stack just to speak the Git wire
protocol is quite likely overkill, but I think the embedding of a
proven extensible format inside of a bi-directional framed streaming
system like HTTP/2 offers some good guidance.

Network protocol parsing is hard. Especially in languages like C where
buffer overflows are possible. Or where a client could trivially DoS a
server by sending a packet of size uint_max and the server naively
trying to malloc() that buffer. Defining the network protocol in an
IDL like protobuf 3 and being machine generated from stable well
maintained code has its advantages.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-04  4:27                                   ` Shawn Pearce
@ 2015-03-04 12:05                                     ` Duy Nguyen
  2015-03-04 19:10                                       ` Shawn Pearce
  0 siblings, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-03-04 12:05 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, Stefan Beller, Git Mailing List

On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce <spearce@spearce.org> wrote:
> Let me go on a different tangent a bit from the current protocol.
>
> http://www.grpc.io/ was recently released and is built on the HTTP/2
> standard. It uses protobuf as a proven extensibility mechanism.
> Including a full C based grpc stack just to speak the Git wire
> protocol is quite likely overkill, but I think the embedding of a
> proven extensible format inside of a bi-directional framed streaming
> system like HTTP/2 offers some good guidance.

I'll take this as "learn from grpc, not just reuse grpc"

> Network protocol parsing is hard. Especially in languages like C where
> buffer overflows are possible. Or where a client could trivially DoS a
> server by sending a packet of size uint_max and the server naively
> trying to malloc() that buffer. Defining the network protocol in an
> IDL like protobuf 3 and being machine generated from stable well
> maintained code has its advantages.

I'm still studying the spec, so I can't comment if using IDL/protobuf3
is a good idea yet.

But I think at least we can avoid DoS by changing the pkt-line (again)
a bit: the length 0xffff means that actual length is 0xfffe and the
next pkt-line is part of this pkt-line. Higher level (upload-pack or
fetch-pack, for example) must set an upper limit for packet_read() so
it won't try to concatenate pkt-lines forever.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-04 12:05                                     ` Duy Nguyen
@ 2015-03-04 19:10                                       ` Shawn Pearce
  2015-03-05  1:03                                         ` Stefan Beller
  0 siblings, 1 reply; 80+ messages in thread
From: Shawn Pearce @ 2015-03-04 19:10 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Stefan Beller, Git Mailing List

On Wed, Mar 4, 2015 at 4:05 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce <spearce@spearce.org> wrote:
>> Let me go on a different tangent a bit from the current protocol.
>>
>> http://www.grpc.io/ was recently released and is built on the HTTP/2
>> standard. It uses protobuf as a proven extensibility mechanism.
>> Including a full C based grpc stack just to speak the Git wire
>> protocol is quite likely overkill, but I think the embedding of a
>> proven extensible format inside of a bi-directional framed streaming
>> system like HTTP/2 offers some good guidance.
>
> I'll take this as "learn from grpc, not just reuse grpc"

Correct, that was what I was trying to say and I just wrote it poorly.

HTTP 1.x, HTTP/2 and protobuf have proven themselves to be fairly open
to extension and working well in the wild for transports. There is
useful guidance there that we should draw from to try and leave doors
open for the future.

HTTP/2, protobuf and grpc are fairly complex. I consider any one of
them too complicated for Git specific use. However HTTP/2 is probably
the future of HTTP stacks so we may see it show up in libcurl or
something as popular as libcurl in another 10 years. Hg had some
reasonably sane ideas about building the wire protocol to work well on
HTTP 1.x upfront rather than Git tacking it on much later.

>> Network protocol parsing is hard. Especially in languages like C where
>> buffer overflows are possible. Or where a client could trivially DoS a
>> server by sending a packet of size uint_max and the server naively
>> trying to malloc() that buffer. Defining the network protocol in an
>> IDL like protobuf 3 and being machine generated from stable well
>> maintained code has its advantages.
>
> I'm still studying the spec, so I can't comment if using IDL/protobuf3
> is a good idea yet.
>
> But I think at least we can avoid DoS by changing the pkt-line (again)
> a bit: the length 0xffff means that actual length is 0xfffe and the
> next pkt-line is part of this pkt-line. Higher level (upload-pack or
> fetch-pack, for example) must set an upper limit for packet_read() so
> it won't try to concatenate pkt-lines forever.

pkt-line is a reasonably simple and efficient framing system. A 64 KiB
pkt-line frame only costs ~0.0061% overhead; ~0.0076% overhead if you
are a pack stream in a side-band-64k channel. That is probably more
efficient than HTTP/2 or SSL framing.

I see no reason to attempt to try and reduce that overhead further. 64
KiB frame size is enough for anyone to move data efficiently with
these headers. In practice you are going to wrap that up into SSH or
SSL/TLS and those overheads are so much higher it doesn't matter we
have a tiny loss here.

I think a mistake in the wire protocol was making the pkt-line length
human readable hex, but the sideband channel binary. _If_ we redo the
framing the only change I would make is making the side band readable.
Thus far we have only used 0, 1, 2 for sideband channels. These could
easily be moved into human readable channel ids:

  'd':  currently sideband 0; this is the application data, aka the pack data
  'p':  currently sideband 1; this is the progress stream for stderr
  'e':  currently sideband 2; there was an error, data in this packet
is the message text, and the connection will shutdown after the
packet.

And then leave all other sideband values undefined and reserved for
future use, just like they are all open today.

I am not convinced framing changes are necessary. I would fine with
leaving the sideband streams as 0,1,2... but if we want a text based
protocol for ease of debugging we should be text based across the
board and try very hard to avoid these binary values in the framing,
or ever needing to use a magical NUL byte in the middle of a packet to
find a gap in older parsers for newer data.

If you want to build a larger stream like ref advertisement inside a
pkt-line framing without using a pkt-line per ref record, you should
follow the approach used by pack data streams where it uses the 5 byte
side-band pkt-line framing and a side-band channel is allocated for
that data. Application code can then run a side-band demux to yank out
the inner stream and parse it.

It may be simpler to restrict ref names to be smaller than 64k in
length so you have room for framing and hash value to be transferred
inside of a single pkt-line, then use the pkt-line framing to do the
transfer.

Today's upload-pack ref advertisment has ~25% overhead. Most of that
is in the duplicated tag name for the peeled refs/tags/v1.0^{} lines.
If you drop those names (but keep the pkt-line and SHA-1), its only
about 8% overhead above the packed-refs file.

I think optimization efforts for ref advertisement need to focus on
reducing the number of refs sent back and forth, not shrinking the
individual records down smaller.

Earlier in this thread Junio raised a point that the flush-pkt is
confusing because it has way too many purposes. I agree. IIRC we have
0001-0003 as invalid pkt-line headers that could be used for special
markers in the stream.

For example we could use 0000 flush-pkt as "end of section" and define
0001 as "i am done speaking and now wait for you to reply".

I think we need to keep 0004 as "empty packet" in case we ever get
ourselves into a position where it makes sense to pass an empty string
from one end of the connection to another.

Running pkt-line inside of side-band-64k is... weird to read. IIRC we
only do that in the smart HTTP protocol to the remote-curl helper, and
this is where Junio's remarks about flush-pkt being confusing come
from. If we rewrote the protocol to have that explicit 0001 "i am done
talking" marker the helper wouldn't need the double framing to
understand how to break up the originally bidirectional packet stream
into a sequence of bursts sent on unidirectional HTTP 1.x.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-04 19:10                                       ` Shawn Pearce
@ 2015-03-05  1:03                                         ` Stefan Beller
  2015-03-05 16:03                                           ` Stefan Beller
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-03-05  1:03 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Duy Nguyen, Junio C Hamano, Git Mailing List

On Wed, Mar 4, 2015 at 11:10 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Wed, Mar 4, 2015 at 4:05 AM, Duy Nguyen <pclouds@gmail.com> wrote:
>> On Wed, Mar 4, 2015 at 11:27 AM, Shawn Pearce <spearce@spearce.org> wrote:
>>> Let me go on a different tangent a bit from the current protocol.
>>>
>>> http://www.grpc.io/ was recently released and is built on the HTTP/2
>>> standard. It uses protobuf as a proven extensibility mechanism.
>>> Including a full C based grpc stack just to speak the Git wire
>>> protocol is quite likely overkill, but I think the embedding of a
>>> proven extensible format inside of a bi-directional framed streaming
>>> system like HTTP/2 offers some good guidance.
>>
>> I'll take this as "learn from grpc, not just reuse grpc"
>
> Correct, that was what I was trying to say and I just wrote it poorly.
>
> HTTP 1.x, HTTP/2 and protobuf have proven themselves to be fairly open
> to extension and working well in the wild for transports. There is
> useful guidance there that we should draw from to try and leave doors
> open for the future.
>
> HTTP/2, protobuf and grpc are fairly complex. I consider any one of
> them too complicated for Git specific use. However HTTP/2 is probably
> the future of HTTP stacks so we may see it show up in libcurl or
> something as popular as libcurl in another 10 years. Hg had some
> reasonably sane ideas about building the wire protocol to work well on
> HTTP 1.x upfront rather than Git tacking it on much later.
>
>>> Network protocol parsing is hard. Especially in languages like C where
>>> buffer overflows are possible. Or where a client could trivially DoS a
>>> server by sending a packet of size uint_max and the server naively
>>> trying to malloc() that buffer. Defining the network protocol in an
>>> IDL like protobuf 3 and being machine generated from stable well
>>> maintained code has its advantages.
>>
>> I'm still studying the spec, so I can't comment if using IDL/protobuf3
>> is a good idea yet.
>>
>> But I think at least we can avoid DoS by changing the pkt-line (again)
>> a bit: the length 0xffff means that actual length is 0xfffe and the
>> next pkt-line is part of this pkt-line. Higher level (upload-pack or
>> fetch-pack, for example) must set an upper limit for packet_read() so
>> it won't try to concatenate pkt-lines forever.
>
> pkt-line is a reasonably simple and efficient framing system. A 64 KiB
> pkt-line frame only costs ~0.0061% overhead; ~0.0076% overhead if you
> are a pack stream in a side-band-64k channel. That is probably more
> efficient than HTTP/2 or SSL framing.
>
> I see no reason to attempt to try and reduce that overhead further. 64
> KiB frame size is enough for anyone to move data efficiently with
> these headers. In practice you are going to wrap that up into SSH or
> SSL/TLS and those overheads are so much higher it doesn't matter we
> have a tiny loss here.
>
> I think a mistake in the wire protocol was making the pkt-line length
> human readable hex, but the sideband channel binary. _If_ we redo the
> framing the only change I would make is making the side band readable.
> Thus far we have only used 0, 1, 2 for sideband channels. These could
> easily be moved into human readable channel ids:
>
>   'd':  currently sideband 0; this is the application data, aka the pack data
>   'p':  currently sideband 1; this is the progress stream for stderr
>   'e':  currently sideband 2; there was an error, data in this packet
> is the message text, and the connection will shutdown after the
> packet.
>
> And then leave all other sideband values undefined and reserved for
> future use, just like they are all open today.
>
> I am not convinced framing changes are necessary. I would fine with
> leaving the sideband streams as 0,1,2... but if we want a text based
> protocol for ease of debugging we should be text based across the
> board and try very hard to avoid these binary values in the framing,
> or ever needing to use a magical NUL byte in the middle of a packet to
> find a gap in older parsers for newer data.
>
>
> If you want to build a larger stream like ref advertisement inside a
> pkt-line framing without using a pkt-line per ref record, you should
> follow the approach used by pack data streams where it uses the 5 byte
> side-band pkt-line framing and a side-band channel is allocated for
> that data. Application code can then run a side-band demux to yank out
> the inner stream and parse it.
>
> It may be simpler to restrict ref names to be smaller than 64k in
> length so you have room for framing and hash value to be transferred
> inside of a single pkt-line, then use the pkt-line framing to do the
> transfer.
>
> Today's upload-pack ref advertisment has ~25% overhead. Most of that
> is in the duplicated tag name for the peeled refs/tags/v1.0^{} lines.
> If you drop those names (but keep the pkt-line and SHA-1), its only
> about 8% overhead above the packed-refs file.
>
> I think optimization efforts for ref advertisement need to focus on
> reducing the number of refs sent back and forth, not shrinking the
> individual records down smaller.
>
>
> Earlier in this thread Junio raised a point that the flush-pkt is
> confusing because it has way too many purposes. I agree. IIRC we have
> 0001-0003 as invalid pkt-line headers that could be used for special
> markers in the stream.
>
> For example we could use 0000 flush-pkt as "end of section" and define
> 0001 as "i am done speaking and now wait for you to reply".
>
> I think we need to keep 0004 as "empty packet" in case we ever get
> ourselves into a position where it makes sense to pass an empty string
> from one end of the connection to another.
>
>
> Running pkt-line inside of side-band-64k is... weird to read. IIRC we
> only do that in the smart HTTP protocol to the remote-curl helper, and
> this is where Junio's remarks about flush-pkt being confusing come
> from. If we rewrote the protocol to have that explicit 0001 "i am done
> talking" marker the helper wouldn't need the double framing to
> understand how to break up the originally bidirectional packet stream
> into a sequence of bursts sent on unidirectional HTTP 1.x.

So I think my next steps are to look at grpc and the sideband channel
a bit more closely to have an opinion on my own.

At the beginning of the zear (Jan 8th) I started collecting data about
the refs advertisement. So I saved the output of `git ls-remote`
for
    https://github.com/gitster/git
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
    https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
    http://anongit.freedesktop.org/git/systemd/systemd.git

It's 57 days since I started the collection. For git.git the total size (of all
57 outputs) combined is 7.0 MB.

When compressing all into a zip file, we get down to 3.3 MB.
(I figure the zip format is not the right way to measure entropy here)

So when using tar.xz instead, all of the outputs combined compress
to 64.6 kB, while a single output compressed to 52.0 kB
(uncompressed at 128kB, which makes sense as we have
lots of redundancy with the common re-occurring pattern
"refs/heads/" as well as "refs/remotes/". Also the tag lines
should not change over time, so we would not need to
retransmit them if we know the client has it already.

Maybe the transmission of the diff of the ls-remote would help us
as well

    # the current size for each ls-remote in bytes:
    wc -c  git_2015-03-03_01:41:01-08:00
    127889 git_2015-03-03_01:41:01-08:00
    # The diff of one month:
    diff git_2015-03-03_01:41:01-08:00 git_2015-02-01_01\:41\:01-08\:00 |wc -c
    11257
    # Fetch after one week
    diff git_2015-03-03_01:41:01-08:00 git_2015-02-25_01:41:01-08:00 |wc -c
    6308
    # just one day:
    diff git_2015-03-03_01:41:01-08:00 git_2015-03-01_01\:41\:02-08\:00 |wc -c
    1881

If anyone wants to experiment with the data I gathered, I can make them
available.

Thanks for all your different ideas!
Stefan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-05  1:03                                         ` Stefan Beller
@ 2015-03-05 16:03                                           ` Stefan Beller
  0 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-03-05 16:03 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Shawn Pearce

On Wed, Mar 4, 2015 at 5:03 PM, Stefan Beller <sbeller@google.com> wrote:
>
> If anyone wants to experiment with the data I gathered, I can make them
> available.
>

All data of `ls-remote` including the gathering script is found at

(112 kB .tar.xz)
https://drive.google.com/file/d/0B7E93UKgFAfjcHRvM1N2YjBfTzA/view?usp=sharing
(6.6MB in .zip)
https://drive.google.com/file/d/0B7E93UKgFAfjRko3WHhtUWZtTEU/view?usp=sharing

I also do have all the object files which are referenced in the
outputs of ls-remote, though sharing
them is a bit tough as I cannot just git push them (forced pushes make
some of the objects
unreachable, so my local gathering repo is explicitly configured to
not garbage collect), and these
are huge compared to just the output of ls-remote.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH] protocol upload-pack-v2
  2015-03-02  9:21                           ` Duy Nguyen
  2015-03-02  9:24                             ` Duy Nguyen
  2015-03-03 10:33                             ` Duy Nguyen
@ 2015-03-06 23:38                             ` Stefan Beller
  2015-03-06 23:40                               ` Stefan Beller
                                                 ` (4 more replies)
  2 siblings, 5 replies; 80+ messages in thread
From: Stefan Beller @ 2015-03-06 23:38 UTC (permalink / raw)
  To: pclouds; +Cc: gitster, git, Stefan Beller

From: Duy Nguyen <pclouds@gmail.com>

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Notes:
    This is just aimed at untangling capabilities and refs
    advertisement, no new features.
    
    Hence this is missing the proposal from Duy to save one RTT.
    I have the impression that most of you are dreaming about new
    features which would solve the actual problem, but I'd first try
    to change the protocol which enables such game changers without
    adding new features at first. The optimisation to save a round trip
    or the optimisation how to save sending all the refs will come
    in a follow up series.
    
    XXX: this approach wastes one round trip in smart-http because the
    client would speak first. Perhaps we could allow client speculation.
    It can assume what server caps will send and send commands based on that
    assumption. If it turns out true, we save one round trip. E.g. fast
    path:
       C: You are supposed to send caps A, B. I would respond with cap B.
          Then I would send "want-refs refs/heads/foo".
       S: (yes we are sending caps A and B), validate client caps,
          execute "want-refs" and return ref list
    
    and slow path:
    
       C: You are supposed to send caps A, B. I would respond with cap B.
          Then I would send "want-refs refs/heads/foo".
       S: Send caps A, B and C. ignore the rest from client
       C: Want caps A and C. Send "want-refs foo"
       S: return ref foo

 Documentation/technical/pack-protocol.txt         | 122 ++++++++++++++++++++--
 Documentation/technical/protocol-capabilities.txt |  22 ++--
 2 files changed, 121 insertions(+), 23 deletions(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 462e206..38bed2c 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -1,11 +1,11 @@
 Packfile transfer protocols
 ===========================
 
-Git supports transferring data in packfiles over the ssh://, git:// and
+Git supports transferring data in packfiles over the ssh://, git://, http:// and
 file:// transports.  There exist two sets of protocols, one for pushing
 data from a client to a server and another for fetching data from a
-server to a client.  All three transports (ssh, git, file) use the same
-protocol to transfer data.
+server to a client.  The three transports (ssh, git, file) use the same
+protocol to transfer data. http is documented in http-protocol.txt.
 
 The processes invoked in the canonical Git implementation are 'upload-pack'
 on the server side and 'fetch-pack' on the client side for fetching data;
@@ -14,6 +14,12 @@ data.  The protocol functions to have a server tell a client what is
 currently on the server, then for the two to negotiate the smallest amount
 of data to send in order to fully update one or the other.
 
+"upload-pack-2" and "receive-pack-2" are the next generation of
+"upload-pack" and "receive-pack" respectively. The first two are
+referred as "version 2" in this document and pack-capabilities.txt
+while the last two are "version 1". Unless stated otherwise, "version 1"
+is implied.
+
 Transports
 ----------
 There are three transports over which the packfile protocol is
@@ -42,7 +48,8 @@ hostname parameter, terminated by a NUL byte.
 
 --
    git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
-   request-command   = "git-upload-pack" / "git-receive-pack" /
+   request-command   = "git-upload-pack" / "git-upload-pack-2" /
+		       "git-receive-pack" / "git-receive-pack-2" /
 		       "git-upload-archive"   ; case sensitive
    pathname          = *( %x01-ff ) ; exclude NUL
    host-parameter    = "host=" hostname [ ":" port ]
@@ -67,7 +74,6 @@ gracefully with an error message.
   error-line     =  PKT-LINE("ERR" SP explanation-text)
 ----
 
-
 SSH Transport
 -------------
 
@@ -124,9 +130,56 @@ has, the first can 'fetch' from the second.  This operation determines
 what data the server has that the client does not then streams that
 data down to the client in packfile format.
 
+Capability discovery (v2)
+-------------------------
 
-Reference Discovery
--------------------
+In version 1, capability discovery is part of reference discovery and
+covered in reference discovery section.
+
+In version 2, when the client initially connects, the server
+immediately sends its capabilities to the client. Then the client must
+send the list of server capabilities it wants to use to the server.
+
+   S: 00XXcapabilities 4\n
+   S: 00XXcap:lang\n
+   S: 00XXcap:thin-pack\n
+   S: 00XXcap:ofs-delta\n
+   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
+
+   C: 00XXcapabilities 3
+   C: 00XXcap:thin-pack\n
+   C: 00XXcap:ofs-delta\n
+   C: 00XXcap:lang=en\n
+   C: 00XXagent:agent=git/custom_string\n
+
+----
+  cap              =  PKT-LINE("capabilities" SP size LF list)
+  size             =  *DIGIT
+  capability-list  =  *(capability) [agent LF]
+  capability       =  "cap:" keyvaluepair LF
+  agent            =  keyvaluepair LF
+  keyvaluepair     =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
+  LC_ALPHA         =  %x61-7A
+----
+
+The client MUST ignore any data before the pkt-line starting with "capabilities"
+for future easy of extension.
+
+The server MUST advertise "size" as the decimal number of lines following
+the "capabilities" line. This includes lines starting "cap:" and "agent:" for now.
+The client MUST ignore lines which start with an unknown pattern.
+
+The client MUST NOT ask for capabilities the server did not say it
+supports. The server MUST diagnose and abort if capabilities it does
+not understand was requested. The server MUST NOT ignore capabilities
+that client requested and server advertised.  As a consequence of these
+rules, server MUST NOT advertise capabilities it does not understand.
+
+See protocol-capabilities.txt for a list of allowed server and client
+capabilities and descriptions.
+
+Reference Discovery (v1)
+------------------------
 
 When the client initially connects the server will immediately respond
 with a listing of each reference it has (all branches and tags) along
@@ -178,16 +231,69 @@ MUST peel the ref if it's an annotated tag.
   shallow          =  PKT-LINE("shallow" SP obj-id)
 
   capability-list  =  capability *(SP capability)
-  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
   LC_ALPHA         =  %x61-7A
 ----
 
 Server and client MUST use lowercase for obj-id, both MUST treat obj-id
 as case-insensitive.
 
+On the very first line of the initial server response of either
+receive-pack and upload-pack the first reference is followed by
+a NUL byte and then a list of space delimited server capabilities.
+These allow the server to declare what it can and cannot support
+to the client.
+
+Client will then send a space separated list of capabilities it wants
+to be in effect. The client MUST NOT ask for capabilities the server
+did not say it supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent.  Server MUST NOT ignore capabilities that client requested
+and server advertised.  As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
 See protocol-capabilities.txt for a list of allowed server capabilities
 and descriptions.
 
+Reference Discovery (v2)
+------------------------
+
+In version 2, reference discovery is initiated by the client with
+"want-refs" line. The client may skip reference discovery phase
+entirely by not sending "want-refs" (e.g. the client has another way
+to retrieve ref list).
+
+----
+  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
+  mode       =  "all"
+  argument   =  SP arg
+  arg        =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
+----
+
+Mode "all" sends all visible refs to the client like in version 1. No
+arguments are accepted in this mode. More modes may be available based
+on capabilities.
+
+----
+  advertised-refs  =  (no-refs / list-of-refs)
+		      *shallow
+		      flush-pkt
+
+  no-refs          =  PKT-LINE(zero-id LF)
+
+  list-of-refs     =  *other-ref
+  other-ref        =  PKT-LINE(other-tip / other-peeled)
+  other-tip        =  obj-id SP refname LF
+  other-peeled     =  obj-id SP refname "^{}" LF
+
+  shallow          =  PKT-LINE("shallow" SP obj-id)
+
+  capability-list  =  capability *(SP capability)
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
+  LC_ALPHA         =  %x61-7A
+----
+
 Packfile Negotiation
 --------------------
 After reference and capabilities discovery, the client can decide to
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 4f8a7bf..ecb0efd 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -3,21 +3,6 @@ Git Protocol Capabilities
 
 Servers SHOULD support all capabilities defined in this document.
 
-On the very first line of the initial server response of either
-receive-pack and upload-pack the first reference is followed by
-a NUL byte and then a list of space delimited server capabilities.
-These allow the server to declare what it can and cannot support
-to the client.
-
-Client will then send a space separated list of capabilities it wants
-to be in effect. The client MUST NOT ask for capabilities the server
-did not say it supports.
-
-Server MUST diagnose and abort if capabilities it does not understand
-was sent.  Server MUST NOT ignore capabilities that client requested
-and server advertised.  As a consequence of these rules, server MUST
-NOT advertise capabilities it does not understand.
-
 The 'atomic', 'report-status', 'delete-refs', 'quiet', and 'push-cert'
 capabilities are sent and recognized by the receive-pack (push to server)
 process.
@@ -268,3 +253,10 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+v2
+--
+
+'git-upload-pack' and 'git-receive-pack' may advertise this capability
+if the server supports 'git-upload-pack-2' and 'git-receive-pack-2'
+respectively.
-- 
2.3.0.81.gc37f363

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
@ 2015-03-06 23:40                               ` Stefan Beller
  2015-03-06 23:55                               ` Duy Nguyen
                                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-03-06 23:40 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, git@vger.kernel.org, Stefan Beller

On Fri, Mar 6, 2015 at 3:38 PM, Stefan Beller <sbeller@google.com> wrote:
> From: Duy Nguyen <pclouds@gmail.com>

Oops. I edited the proposal from Duy heavily(?), such that it is
different from what he proposed 4 days ago.

In my impression this is what most of the participants would agree on.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
  2015-03-06 23:40                               ` Stefan Beller
@ 2015-03-06 23:55                               ` Duy Nguyen
  2015-03-07  0:00                               ` Duy Nguyen
                                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-06 23:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List

I'm still wondering if we should reserve more from the packet length.
We have used length 0000 for pkt-flush. Shawn pointed out that we
still have 0001, 0002 and 0003 but we may use some of them to avoid
abuse of pkt-flush in some cases. Perhaps we could limit packet length
to 0xfff0, so we have 0xfff1-0xffff to assign special meanings in
future, if we have to.

On Sat, Mar 7, 2015 at 6:38 AM, Stefan Beller <sbeller@google.com> wrote:
> +In version 2, when the client initially connects, the server
> +immediately sends its capabilities to the client. Then the client must
> +send the list of server capabilities it wants to use to the server.
> +
> +   S: 00XXcapabilities 4\n
> +   S: 00XXcap:lang\n
> +   S: 00XXcap:thin-pack\n
> +   S: 00XXcap:ofs-delta\n
> +   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
> +
> +   C: 00XXcapabilities 3
> +   C: 00XXcap:thin-pack\n
> +   C: 00XXcap:ofs-delta\n
> +   C: 00XXcap:lang=en\n
> +   C: 00XXagent:agent=git/custom_string\n
> +
> +----
> +  cap              =  PKT-LINE("capabilities" SP size LF list)
> +  size             =  *DIGIT
> +  capability-list  =  *(capability) [agent LF]
> +  capability       =  "cap:" keyvaluepair LF
> +  agent            =  keyvaluepair LF
> +  keyvaluepair     =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")

If we send one cap per pkt-line, cap can contain spaces. The question
is, should we allow them?

Ending cap lines with LF seems redudant because we already know the line length.

> +  LC_ALPHA         =  %x61-7A
> +----
> +
> +The client MUST ignore any data before the pkt-line starting with "capabilities"
> +for future easy of extension.
> +
> +The server MUST advertise "size" as the decimal number of lines following
> +the "capabilities" line. This includes lines starting "cap:" and "agent:" for now.
> +The client MUST ignore lines which start with an unknown pattern.

I think the common pattern in our protocol is to end these with a
pkt-flush, instead of send the number of items upfront. If we do that
we don't have to specify "cap:" or "agent:" either. All pkt-lines
until pkt-flush at the beginning of v2 are cap lines. And agent is
just another "capability".
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
  2015-03-06 23:40                               ` Stefan Beller
  2015-03-06 23:55                               ` Duy Nguyen
@ 2015-03-07  0:00                               ` Duy Nguyen
  2015-03-07  0:28                               ` Junio C Hamano
  2015-03-10  1:38                               ` Duy Nguyen
  4 siblings, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-07  0:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List

On Sat, Mar 7, 2015 at 6:38 AM, Stefan Beller <sbeller@google.com> wrote:
> +Reference Discovery (v2)
> +------------------------
> +
> +In version 2, reference discovery is initiated by the client with
> +"want-refs" line. The client may skip reference discovery phase
> +entirely by not sending "want-refs" (e.g. the client has another way
> +to retrieve ref list).
> +
> +----
> +  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
> +  mode       =  "all"
> +  argument   =  SP arg
> +  arg        =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
> +----

On the same line with capabilities, we would not want to run into a
situation where we have too many arguments to send and exceed pkt-line
limit. So perhaps do one argument per pkt-line too, ending with a
pkt-flush.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
                                                 ` (2 preceding siblings ...)
  2015-03-07  0:00                               ` Duy Nguyen
@ 2015-03-07  0:28                               ` Junio C Hamano
  2015-03-07  4:28                                 ` Stefan Beller
  2015-03-10  1:38                               ` Duy Nguyen
  4 siblings, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2015-03-07  0:28 UTC (permalink / raw)
  To: Stefan Beller; +Cc: pclouds, git

Stefan Beller <sbeller@google.com> writes:

> @@ -67,7 +74,6 @@ gracefully with an error message.
>    error-line     =  PKT-LINE("ERR" SP explanation-text)
>  ----
>  
> -
>  SSH Transport

Noise?

> @@ -124,9 +130,56 @@ has, the first can 'fetch' from the second.  This operation determines
>  what data the server has that the client does not then streams that
>  data down to the client in packfile format.
>  
> +Capability discovery (v2)
> +-------------------------
>  
> +In version 1, capability discovery is part of reference discovery and
> +covered in reference discovery section.
> +
> +In version 2, when the client initially connects, the server
> +immediately sends its capabilities to the client. Then the client must
> +send the list of server capabilities it wants to use to the server.
> +
> +   S: 00XXcapabilities 4\n
> +   S: 00XXcap:lang\n
> +   S: 00XXcap:thin-pack\n
> +   S: 00XXcap:ofs-delta\n
> +   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
> +
> +   C: 00XXcapabilities 3
> +   C: 00XXcap:thin-pack\n
> +   C: 00XXcap:ofs-delta\n
> +   C: 00XXcap:lang=en\n
> +   C: 00XXagent:agent=git/custom_string\n

I do not see a good reason why we want "I am sending N caps"
upfront, instead of "this, that, and here is the end of the group".
If you expect the recipient to benefit by being able to pre-allocate
N slots, then that might make sense, but otherwise, I'd rather see
us stick to a (weaker) flush that says "group ends here".

Besides, I do not know how you counted 4 on the S: side and 3 on
the C: side in the above example and missing LF after 3 ;-).

> +----
> +  cap              =  PKT-LINE("capabilities" SP size LF list)

Isn't a cap packet terminated by LF without any "list" following it?
The notation PKT-LINE(<blah>) is "wrap <blah> in a single packet",
and I do not think you meant to send the capability line and a series
of cap:foo entries in a single packet.

> +  size             =  *DIGIT
> +  capability-list  =  *(capability) [agent LF]
> +  capability       =  "cap:" keyvaluepair LF
> +  agent            =  keyvaluepair LF

I do not see a reason to make 'agent' as part of capability.  That
was an implementation detail of v1 but v2 does not have an
obligation to consider agent announcement as capability.

server-announcement = PKT-LINE("capabilities" ...) capability-list [agent-announcement]
capability-list = as you have it w/o "[agent LF]"
agent-announcement = PKT-LINE("agent=" agent-token LF)

or something, perhaps?

> +The client MUST ignore any data before the pkt-line starting with "capabilities"
> +for future easy of extension.

s/easy/ease/; but I am not sure if this makes sense.  Without
knowing the extended preamble, you cannot even tell if a packet line
that happens to start with "capabilities" is a proper beginning of
0-th iteration of v2 protocol, or an embedded data in the preamble,
no?

> +Reference Discovery (v2)
> +------------------------
> +
> +In version 2, reference discovery is initiated by the client with
> +"want-refs" line. The client may skip reference discovery phase
> +entirely by not sending "want-refs" (e.g. the client has another way
> +to retrieve ref list).
> +
> +----
> +  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
> +  mode       =  "all"
> +  argument   =  SP arg
> +  arg        =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
> +----
> +
> +Mode "all" sends all visible refs to the client like in version 1. No
> +arguments are accepted in this mode. More modes may be available based
> +on capabilities.

I tend to agree with Duy that the protocol must anticipate that args
can become longer.

> +----
> +  advertised-refs  =  (no-refs / list-of-refs)
> +		      *shallow
> +		      flush-pkt

I am not sure if defining "shallow" as part of "refs advertisement"
is a good idea.  The latter lives in the same place in the v1
protocol merely because that was how it was later bolted onto the
protocol.  But this concern can easily be allayed by retitling
"advertised-refs" to something else.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-07  0:28                               ` Junio C Hamano
@ 2015-03-07  4:28                                 ` Stefan Beller
  2015-03-07  5:21                                   ` Duy Nguyen
  2015-03-08 20:36                                   ` Junio C Hamano
  0 siblings, 2 replies; 80+ messages in thread
From: Stefan Beller @ 2015-03-07  4:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, git@vger.kernel.org

On Fri, Mar 6, 2015 at 4:28 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> @@ -67,7 +74,6 @@ gracefully with an error message.
>>    error-line     =  PKT-LINE("ERR" SP explanation-text)
>>  ----
>>
>> -
>>  SSH Transport
>
> Noise?
>
>> @@ -124,9 +130,56 @@ has, the first can 'fetch' from the second.  This operation determines
>>  what data the server has that the client does not then streams that
>>  data down to the client in packfile format.
>>
>> +Capability discovery (v2)
>> +-------------------------
>>
>> +In version 1, capability discovery is part of reference discovery and
>> +covered in reference discovery section.
>> +
>> +In version 2, when the client initially connects, the server
>> +immediately sends its capabilities to the client. Then the client must
>> +send the list of server capabilities it wants to use to the server.
>> +
>> +   S: 00XXcapabilities 4\n
>> +   S: 00XXcap:lang\n
>> +   S: 00XXcap:thin-pack\n
>> +   S: 00XXcap:ofs-delta\n
>> +   S: 00XXagent:agent=git/2:3.4.5+custom-739-gb850f98\n
>> +
>> +   C: 00XXcapabilities 3
>> +   C: 00XXcap:thin-pack\n
>> +   C: 00XXcap:ofs-delta\n
>> +   C: 00XXcap:lang=en\n
>> +   C: 00XXagent:agent=git/custom_string\n
>
> I do not see a good reason why we want "I am sending N caps"
> upfront, instead of "this, that, and here is the end of the group".

I thought about having an end marker, so something like
capabilities start
thin-pack
lang
ofs-delta
capabilities done

(Each line a pkt-line)

Though all decisions I thought through I tried to put more weight on
future expandability. Now that I think about it again, it makes no
difference, whether to use a counter or an end marker.

> If you expect the recipient to benefit by being able to pre-allocate
> N slots, then that might make sense, but otherwise, I'd rather see
> us stick to a (weaker) flush that says "group ends here".

I think it's not about pre allocating but counting down. Then you know
at the beginning how much to expect which might become relevant if
that section grows large again. ("The server really wants to send 1500
capability lines? Nope I'll just disconnect because I am on mobile!")

Implementation wise an end marker is easier though (you don't need
to count down, so it feels more stateless to me).

>
> Besides, I do not know how you counted 4 on the S: side and 3 on
> the C: side in the above example and missing LF after 3 ;-).
>

Sorry about that, I added one answer late and forgot to increment the 3.

>> +----
>> +  cap              =  PKT-LINE("capabilities" SP size LF list)
>
> Isn't a cap packet terminated by LF without any "list" following it?
> The notation PKT-LINE(<blah>) is "wrap <blah> in a single packet",
> and I do not think you meant to send the capability line and a series
> of cap:foo entries in a single packet.

Yeah I meant to use one packet per line
So after considering your input, you'd want to have
PKT-LINE("capabilities start")
PKT-LINE("no-prefix-for-capabilities")
PKT-LINE("ofs-delta")
PKT-LINE("agent-as-capability-2.6")
PKT-LINE("capabilities end")

And additionally to that a PKT-LINE should have the ability to grow larger than
0xffff, which would be encoded with 0xffff being an escape character
indicating the
length is encoded somehow different. (Maybe take the next 8 bytes
instead of just 4).


>
>> +  size             =  *DIGIT
>> +  capability-list  =  *(capability) [agent LF]
>> +  capability       =  "cap:" keyvaluepair LF
>> +  agent            =  keyvaluepair LF
>
> I do not see a reason to make 'agent' as part of capability.  That
> was an implementation detail of v1 but v2 does not have an
> obligation to consider agent announcement as capability.

So I think we don't want to drop the agent announcement as it may
reveal useful information ("How many outdated clients connect to our
$HOSTING_SITE?", "I need to debug failures which happen only rarely,
Oh! it can be narrowed down to clients with agent xyz.")

So then we need to decide where to put the agent. And as it is only small
but useful (meta?)-information I'd rather put it at the beginning of the
data exchange, so in case the other side seems to be missbehaving,
it is easier to debug in the hope the agent transmission was still
successful.

>
> server-announcement = PKT-LINE("capabilities" ...) capability-list [agent-announcement]
> capability-list = as you have it w/o "[agent LF]"
> agent-announcement = PKT-LINE("agent=" agent-token LF)
>
> or something, perhaps?

This looks like me as if all capabilities are in one PKT-LINE, which
you seemed to oppose?

>
>> +The client MUST ignore any data before the pkt-line starting with "capabilities"
>> +for future easy of extension.
>
> s/easy/ease/; but I am not sure if this makes sense.  Without
> knowing the extended preamble, you cannot even tell if a packet line
> that happens to start with "capabilities" is a proper beginning of
> 0-th iteration of v2 protocol, or an embedded data in the preamble,
> no?

I rather thought about the case where the implementation would
just close the connection on sight of unknown preamble.
If we want to extend the protocol again and the string
"capabilites" should be part before the actual capabilities start,
we will think about escaping it in the future as then we can still
talk to clients as of this design.

In case we'd close the connection we would have a similar problem as
of now, it cannot be really extended.

>
>> +Reference Discovery (v2)
>> +------------------------
>> +
>> +In version 2, reference discovery is initiated by the client with
>> +"want-refs" line. The client may skip reference discovery phase
>> +entirely by not sending "want-refs" (e.g. the client has another way
>> +to retrieve ref list).
>> +
>> +----
>> +  want-refs  =  PKT-LINE("want-refs" SP mode *argument)
>> +  mode       =  "all"
>> +  argument   =  SP arg
>> +  arg        =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
>> +----
>> +
>> +Mode "all" sends all visible refs to the client like in version 1. No
>> +arguments are accepted in this mode. More modes may be available based
>> +on capabilities.
>
> I tend to agree with Duy that the protocol must anticipate that args
> can become longer.

ok, so PKT-LINE needs to be able to deal with larger lines, I'll add that.

>
>> +----
>> +  advertised-refs  =  (no-refs / list-of-refs)
>> +                   *shallow
>> +                   flush-pkt
>
> I am not sure if defining "shallow" as part of "refs advertisement"
> is a good idea.  The latter lives in the same place in the v1
> protocol merely because that was how it was later bolted onto the
> protocol.  But this concern can easily be allayed by retitling
> "advertised-refs" to something else.

I don't quite understand this. What are your concerns about shallow here?

Thanks on the feedback!
Stefan

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-07  4:28                                 ` Stefan Beller
@ 2015-03-07  5:21                                   ` Duy Nguyen
  2015-03-08 20:36                                   ` Junio C Hamano
  1 sibling, 0 replies; 80+ messages in thread
From: Duy Nguyen @ 2015-03-07  5:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org

On Sat, Mar 7, 2015 at 11:28 AM, Stefan Beller <sbeller@google.com> wrote:
>>> +----
>>> +  advertised-refs  =  (no-refs / list-of-refs)
>>> +                   *shallow
>>> +                   flush-pkt
>>
>> I am not sure if defining "shallow" as part of "refs advertisement"
>> is a good idea.  The latter lives in the same place in the v1
>> protocol merely because that was how it was later bolted onto the
>> protocol.  But this concern can easily be allayed by retitling
>> "advertised-refs" to something else.
>
> I don't quite understand this. What are your concerns about shallow here?

This made me look for explanation about these shallow lines (commit
ad49136).They are sent when source repo is a shallow one. They tell
the receiver about the bottom the source repo. My experiment with "git
clone --since" shows that such a shallow clone could end up with a few
thousand shallow lines. Not as many as refs, but sending shallow lines
this way still cost more than necessary. If we want to do that, we
need more flexibility that just sending all shallow lines this way.

One of the approach is, because these shallow lines (on source repo)
rarely change. We could simply exchange SHA-1 of the source repo's
"shallow" file first. The client only requests shallow info when SHA-1
does not match.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-07  4:28                                 ` Stefan Beller
  2015-03-07  5:21                                   ` Duy Nguyen
@ 2015-03-08 20:36                                   ` Junio C Hamano
  2015-03-31 19:58                                     ` Junio C Hamano
  1 sibling, 1 reply; 80+ messages in thread
From: Junio C Hamano @ 2015-03-08 20:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

>> I do not see a good reason why we want "I am sending N caps"
>> upfront, instead of "this, that, and here is the end of the group".
>
> I thought about having an end marker, so something like
> capabilities start
> thin-pack
> lang
> ofs-delta
> capabilities done
>
> (Each line a pkt-line)
>
> Though all decisions I thought through I tried to put more weight on
> future expandability. Now that I think about it again, it makes no
> difference, whether to use a counter or an end marker.

One reason why I would suggest avoiding "count upfront" is to make
sure we do not repeat the mistake of "Content-Length" which had to
later be corrected by introducing chunked transfer by HTTP folks.
Closer to home, our "type and then size upfront and then contents
and hash the whole thing" loose object format makes it quite hard
to produce without having the whole thing in-core, or otherwise
having a separate way to know the size upfront.

For the capability list, the number of the capabilities you support
may be limited, bounded and may even be known at the compile time,
so count-upfront may not be a burden, but in other parts of the
protocol where you need to feed the result of computation to the
other end, you would need "the group ends here" marker.  It would
be easier for everybody if we can make all types of messages use
the same syntax, regardless of type.

>>> +  cap              =  PKT-LINE("capabilities" SP size LF list)
>>
>> Isn't a cap packet terminated by LF without any "list" following it?
>> The notation PKT-LINE(<blah>) is "wrap <blah> in a single packet",
>> and I do not think you meant to send the capability line and a series
>> of cap:foo entries in a single packet.
>
> Yeah I meant to use one packet per line
> So after considering your input, you'd want to have
> PKT-LINE("capabilities start")
> PKT-LINE("no-prefix-for-capabilities")
> PKT-LINE("ofs-delta")
> PKT-LINE("agent-as-capability-2.6")
> PKT-LINE("capabilities end")

OK, so that "list" at the end is just a typo; there shouldn't be
"list at the end inside PKT-LINE().

>>> +  size             =  *DIGIT
>>> +  capability-list  =  *(capability) [agent LF]
>>> +  capability       =  "cap:" keyvaluepair LF
>>> +  agent            =  keyvaluepair LF
>>
>> I do not see a reason to make 'agent' as part of capability.  That
>> was an implementation detail of v1 but v2 does not have an
>> obligation to consider agent announcement as capability.
>
> So I think we don't want to drop the agent announcement as it may
> reveal useful information ("How many outdated clients connect to our
> $HOSTING_SITE?", "I need to debug failures which happen only rarely,
> Oh! it can be narrowed down to clients with agent xyz.")

Don't be overly defensive and try not to misunderstand and see a
criticism where there is none.  All I am saying is that agent
announcement is not annoucing capability.  You may announce many
things, and server or client version may be something you would want
to announce.

I have a feeling that it is a bit too premature to specify the
details at such a low level as "capaiblities are announced by
prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record
group has its element count at the beginning (or an end marker at
the end, for that matter)", and it may be a better idea to outline
all the necessary elements at a bit higher level first---that would
avoid needs for useless exchanges like what we are having right now.

It's that when you write things in EBNF, you are writing something
that you would eventually want to cast in stone, and the
non-terminal names in EBNF matter (they convey the semantics, what
these named things are), and I was reacting to that because I wanted
to make sure we avoid mislabaling things as something that are not.

The "shallow" vs "reference advertisement" is the same.  I think the
former is _not_ part of reference announcement but is an optional
phase of the protocol, but the level of the detail that would make
the difference matter would appear only when you start writing it in
EBNF and call both "reference advertisement".  If you keep the
discussion at the level like "fetch first asks capabilities it wants
upload-pack-2 to enable, optionally gives the current shallow
boundaries when the capaibilty says the other side supports it, and
then starts showing what it has" while we are trying to achieve
concensus on what kind of protocol elements we would need, and what
information each element would carry, the discussion will help us
reach a shared understanding on what to write down in EBNF form
exactly faster, I would imagine.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
                                                 ` (3 preceding siblings ...)
  2015-03-07  0:28                               ` Junio C Hamano
@ 2015-03-10  1:38                               ` Duy Nguyen
  2015-03-10 19:36                                 ` Kyle J. McKay
  4 siblings, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-03-10  1:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git Mailing List

A minor point on capability negotiation. I remember why I passed
capabilities via command line instead of this proposal. With this
proposal, daemon.c does not recognize "i18n" capability and cannot
switch to the correct language before it reports an error.

But perhaps I approached it the wrong way. Instead of letting the
daemon produce non-English language directly, if could sort of
standardize the messages and send an error code back in "ERR" line,
together with an English message (current pack-protocol.txt says "ERR"
followed by explanation-text). The client can use this error code to
print non-English messages if it wants to. Perhaps we can reuse http
(or ftp) return codes or some other protocol then customize to fit
Git's needs.
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-10  1:38                               ` Duy Nguyen
@ 2015-03-10 19:36                                 ` Kyle J. McKay
  0 siblings, 0 replies; 80+ messages in thread
From: Kyle J. McKay @ 2015-03-10 19:36 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Junio C Hamano, Git Mailing List

On Mar 9, 2015, at 18:38, Duy Nguyen wrote:
> A minor point on capability negotiation. I remember why I passed
> capabilities via command line instead of this proposal. With this
> proposal, daemon.c does not recognize "i18n" capability and cannot
> switch to the correct language before it reports an error.
>
> But perhaps I approached it the wrong way. Instead of letting the
> daemon produce non-English language directly, if could sort of
> standardize the messages and send an error code back in "ERR" line,
> together with an English message (current pack-protocol.txt says "ERR"
> followed by explanation-text). The client can use this error code to
> print non-English messages if it wants to. Perhaps we can reuse http
> (or ftp) return codes or some other protocol then customize to fit
> Git's needs.

May I suggest that you take a look at RFC 2034 [1].  It describes  
almost exactly this same situation and how SMTP was enhanced to  
support error code numbers that could then be translated.

No reason this enhancement needs to be restricted to protocol v2 if  
it's just an "enhancedstatuscodes" capability the server sends back to  
indicate that its ERR text is in that format.

-Kyle

[1] http://tools.ietf.org/html/rfc2034

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-03 17:13                               ` Junio C Hamano
  2015-03-03 19:47                                 ` Junio C Hamano
  2015-03-04  1:54                                 ` Duy Nguyen
@ 2015-03-24 17:42                                 ` Stefan Beller
  2015-03-24 18:00                                   ` Junio C Hamano
  2 siblings, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-03-24 17:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Git Mailing List

On Tue, Mar 3, 2015 at 9:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> Junio pointed out in private that I didn't address the packet length
>> limit (64k). I thought I could get away with a new capability
>> (i.e. not worry about it now) but I finally admit that was a bad
>> hack. So perhaps this on top.
>
> No, I didn't ;-) but I tend to agree that "perhaps 4GB huge packet?"
> is a bad idea.
>
> The problem I had with the version in your write-up was that it
> still assumed that all capabilities must come on one packet-line.
>

So I started looking into extending the buffer size as another 'first step'
towards the protocol version 2 again. But now I think the packed length
limit of 64k is actually a good and useful thing to have and should be
extended/fixed if and only if we run into serious trouble with too small
packets later.

I mean we can add the possibility now by introducing these
special length 0xFFFF or 0xFFFE to mean we'd want to extend it in the
future. But when doing this we need to be extra careful with buffer allocation.
As it is easy to produce a denial of service attack if the receiving side
blindly trusts the length and allocates as much memory. So having a 64k
limit actually helps preventing this attack a bit as it is a very small number.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH 0/3] protocol v2
  2015-03-24 17:42                                 ` Stefan Beller
@ 2015-03-24 18:00                                   ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-24 18:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Duy Nguyen, Git Mailing List

Stefan Beller <sbeller@google.com> writes:

> So I started looking into extending the buffer size as another 'first step'
> towards the protocol version 2 again. But now I think the packed length
> limit of 64k is actually a good and useful thing to have and should be
> extended/fixed if and only if we run into serious trouble with too small
> packets later.

I tend to agree.  Too large a packet size would mean your latency
would also suck, as pkt-line interface will not give you anything
until you read the entire packet.  The new protocol should be
designed around a reasonably sized packets, using multiple packets
to carry larger payload as necessary.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-08 20:36                                   ` Junio C Hamano
@ 2015-03-31 19:58                                     ` Junio C Hamano
  2015-04-02 12:37                                       ` Duy Nguyen
  2015-04-02 22:18                                       ` Martin Fick
  0 siblings, 2 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-03-31 19:58 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Stefan Beller

Junio C Hamano <gitster@pobox.com> writes:

> I have a feeling that it is a bit too premature to specify the
> details at such a low level as "capaiblities are announced by
> prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record
> group has its element count at the beginning (or an end marker at
> the end, for that matter)", and it may be a better idea to outline
> all the necessary elements at a bit higher level first---that would
> avoid needs for useless exchanges like what we are having right now.
>
> ....  If you keep the
> discussion at the level like "fetch first asks capabilities it wants
> upload-pack-2 to enable, optionally gives the current shallow
> boundaries when the capaibilty says the other side supports it, and
> then starts showing what it has" while we are trying to achieve
> concensus on what kind of protocol elements we would need, and what
> information each element would carry, the discussion will help us
> reach a shared understanding on what to write down in EBNF form
> exactly faster, I would imagine.

And I see we went silent after this, so let's try to stir the pot
again to see if it simmers.

This is a follow-up on $gmane/264553, which is a continuation of
$gmane/264000, but instead of giving two required readings to
readers, I'll start with reproduction of the two, and add a few more
things the current protocol lacks that I would want to see in the
updated protocol.



The current protocol has the following problems that limit us:

 - It is not easy to make it resumable, because we recompute every
   time.  This is especially problematic for the initial fetch aka
   "clone" as we will be talking about a large transfer. Redirection
   to a bundle hosted on CDN might be something we could do
   transparently.

 - The protocol extension has a fairly low length limit.

 - Because the protocol exchange starts by the server side
   advertising all its refs, even when the fetcher is interested in
   a single ref, the initial overhead is nontrivial, especially when
   you are doing a small incremental update.  The worst case is an
   auto-builder that polls every five minutes, even when there is no
   new commits to be fetched.

 - Because we recompute every time, taking into account of what the
   fetcher has, in addition to what the fetcher obtained earlier
   from us in order to reduce the transferred bytes, the payload for
   incremental updates become tailor-made for each fetch and cannot
   be easily reused.

 - The semantics of the side-bands are unclear.

   - Is band #2 meant only for progress output (I think the current
     protocol handlers assume that and unconditionally squelch it
     under --quiet)?  Do we rather want a dedicated "progress" and
     "error message" sidebands instead?

   - Is band #2 meant for human consumption, or do we expect the
     other end to interpret and act on it?  If the former, would it
     make sense to send locale information from the client side and
     ask the server side to produce its output with _("message")?

 - The semantics of packet_flush() is suboptimal, and this
   shortcoming seeps through to the protocol mapped to the
   smart-HTTP transport.

   Originally, packet_flush() was meant as "Here is an end of one
   logical section of what I am going to speak.", hinting that it
   might be a good idea for the underlying implementation to hold
   the packets up to that point in-core and then write(2) them all
   out (i.e. "flush") to the file descriptor only when we handle
   packet_flush().  It never meant "Now I am finished speaking for
   now and it is your turn to speak."

   But because HTTP is inherently a ping-pong protocol where the
   requestor at one point stops talking and lets the responder
   speak, the code to map our protocol to the smart HTTP transport
   made the packet_flush() boundary as "Now I am done talking, it is
   my turn to listen."

   We probably need two kinds of packet_flush().  When a requestor
   needs to say two or more logical groups of things before telling
   the other side "Now I am done talking; it is your turn.", we need
   some marker (i.e. the original meaning of packet_flush()) at the
   end of these logical groups.  And in order to be able to say "Now
   I am done saying everything I need to say at this point for you
   to respond to me.  It is your turn.", we need another kind of
   marker.

 - The fetch-pack direction does the common-parent discovery but the
   push-pack direction does not.  This is OK for the normal
   fast-forward push, in which case we will see a known commit on
   the tip of the branch we are pushing into, but makes forced push
   inefficient.

 - The existing common-parent discovery done on the fetch-pack side
   enumerates commits contiguously traversing the history to the
   past.  We might want to go exponential or Fibonacci to quickly
   find an ancient common commit and bisect the history from there
   (or it might turn out not to be worth it).

 - We may want to revamp the builtin/receive-pack.c::report() that
   reports the final result of a push back to the pusher to tell the
   exact object name that sits at the updated tips of refs, not just
   refnames.  It will allow the server side to accept a push of
   commit X to a branch, do some "magic" on X (e.g. rebase it on top
   of the current tip, merge it with the current tip, or let a hook
   to rewrite the commit in any way it deems appropriate) and put
   the resulting commit Y at the tip of the branch.  Without such a
   revamp, it is currently not possible to sensibly allow the server
   side to rewrite what got pushed.

 - If we were to start allowing the receiver side to rewrite pushed
   commits, the updated send-pack protocol must be able to send the
   new objects created by that "magic" back to the pusher.  The
   current protocol does not allow the receive-pack to send packdata
   back to send-pack.

I'd like to see a new protocol that lets us overcome the above
limitations (did I miss others? I am sure people can help here)
sometime this year.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-31 19:58                                     ` Junio C Hamano
@ 2015-04-02 12:37                                       ` Duy Nguyen
  2015-04-02 14:45                                         ` Junio C Hamano
  2015-04-02 22:18                                       ` Martin Fick
  1 sibling, 1 reply; 80+ messages in thread
From: Duy Nguyen @ 2015-04-02 12:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Stefan Beller

On Wed, Apr 1, 2015 at 2:58 AM, Junio C Hamano <gitster@pobox.com> wrote:
> This is a follow-up on $gmane/264553, which is a continuation of
> $gmane/264000, but instead of giving two required readings to
> readers, I'll start with reproduction of the two, and add a few more
> things the current protocol lacks that I would want to see in the
> updated protocol.

I think the important thing to get v2 started is making sure we do not
need v3 to get rid of any of these limitations. In other words v2
should be extensible enough to implement them later. I'm looking from
this perspective.

> The current protocol has the following problems that limit us:
>
>  - It is not easy to make it resumable, because we recompute every
>    time.  This is especially problematic for the initial fetch aka
>    "clone" as we will be talking about a large transfer. Redirection
>    to a bundle hosted on CDN might be something we could do
>    transparently.

Sending multiple packs or some redirection instructions could be done
even with v1. The only recompute part that is unavoidable in v1 is ref
advertisement, which I think is solved.

>  - The protocol extension has a fairly low length limit.

One pkt-line per protocol extension should do it.

>  - Because the protocol exchange starts by the server side
>    advertising all its refs, even when the fetcher is interested in
>    a single ref, the initial overhead is nontrivial, especially when
>    you are doing a small incremental update.  The worst case is an
>    auto-builder that polls every five minutes, even when there is no
>    new commits to be fetched.

One of the reason v2 is started, should be ok with current v2 design.

>  - Because we recompute every time, taking into account of what the
>    fetcher has, in addition to what the fetcher obtained earlier
>    from us in order to reduce the transferred bytes, the payload for
>    incremental updates become tailor-made for each fetch and cannot
>    be easily reused.

Well, we reuse at a lower level, pack-objects would try to copy
existing deltas instead of making new ones. We can cache new deltas in
hope that they may be useful for the next fetch. But that has nothing
to do with the protocol..

>  - The semantics of the side-bands are unclear.
>
>    - Is band #2 meant only for progress output (I think the current
>      protocol handlers assume that and unconditionally squelch it
>      under --quiet)?  Do we rather want a dedicated "progress" and
>      "error message" sidebands instead?
>
>    - Is band #2 meant for human consumption, or do we expect the
>      other end to interpret and act on it?  If the former, would it
>      make sense to send locale information from the client side and
>      ask the server side to produce its output with _("message")?

The interpretation of side-band could be changed by introducing a new
extension, couldn't it?

>  - The semantics of packet_flush() is suboptimal, and this
>    shortcoming seeps through to the protocol mapped to the
>    smart-HTTP transport.
>
>    ...

I don't have an answer to this one. So the reaction is, if it is not
"broken" (in pratice, not in theory), don't touch it. I know I'm
burying my head in the sand..

>  - The fetch-pack direction does the common-parent discovery but the
>    push-pack direction does not.  This is OK for the normal
>    fast-forward push, in which case we will see a known commit on
>    the tip of the branch we are pushing into, but makes forced push
>    inefficient.

Introducing the ref exchange in push-pack could be done in an
extension too, I think.

>  - The existing common-parent discovery done on the fetch-pack side
>    enumerates commits contiguously traversing the history to the
>    past.  We might want to go exponential or Fibonacci to quickly
>    find an ancient common commit and bisect the history from there
>    (or it might turn out not to be worth it).

Hm.. i'm wondering if we can already do this with v1 if we have enough
man power.

>  - We may want to revamp the builtin/receive-pack.c::report() that
>    reports the final result of a push back to the pusher to tell the
>    exact object name that sits at the updated tips of refs, not just
>    refnames.  It will allow the server side to accept a push of
>    commit X to a branch, do some "magic" on X (e.g. rebase it on top
>    of the current tip, merge it with the current tip, or let a hook
>    to rewrite the commit in any way it deems appropriate) and put
>    the resulting commit Y at the tip of the branch.  Without such a
>    revamp, it is currently not possible to sensibly allow the server
>    side to rewrite what got pushed.

Sounds more coding than changing the protocol, which should be
possible with another extension.

>  - If we were to start allowing the receiver side to rewrite pushed
>    commits, the updated send-pack protocol must be able to send the
>    new objects created by that "magic" back to the pusher.  The
>    current protocol does not allow the receive-pack to send packdata
>    back to send-pack.

Is it cleaner to just initiate another fetch to get that pack? Maybe
race conditions make it not a good option?
-- 
Duy

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-04-02 12:37                                       ` Duy Nguyen
@ 2015-04-02 14:45                                         ` Junio C Hamano
  0 siblings, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-04-02 14:45 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Stefan Beller

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Apr 1, 2015 at 2:58 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> This is a follow-up on $gmane/264553, which is a continuation of
>> $gmane/264000, but instead of giving two required readings to
>> readers, I'll start with reproduction of the two, and add a few more
>> things the current protocol lacks that I would want to see in the
>> updated protocol.
>
> I think the important thing to get v2 started is making sure we do not
> need v3 to get rid of any of these limitations. In other words v2
> should be extensible enough to implement them later.

Yes.

>>    ...
>
> I don't have an answer to this one. So the reaction is,...
> "broken" (in pratice, not in theory), don't touch it. I know I'm
> burying my head in the sand..
>    ...
> Introducing the ref exchange in push-pack could be done in an
> extension too, I think.

Thanks, but you do not have to "solve" these sample issues here.  As
you said above, your primary goal at this stage is to make sure v2
is extensible to cover future needs, and in order to do so, you need
to make sure "what's lacking" list is fairly complete.  You are not
helping from that point of view.

I'd like to see a new protocol that lets us overcome the limitations
(did I miss others? I am sure people can help here) sometime this
year.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-03-31 19:58                                     ` Junio C Hamano
  2015-04-02 12:37                                       ` Duy Nguyen
@ 2015-04-02 22:18                                       ` Martin Fick
  2015-04-02 22:58                                         ` Junio C Hamano
  2015-04-02 23:00                                         ` Stefan Beller
  1 sibling, 2 replies; 80+ messages in thread
From: Martin Fick @ 2015-04-02 22:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Duy Nguyen, Stefan Beller

> The current protocol has the following problems that limit
> us:
> 
>  - It is not easy to make it resumable, because we
> recompute every time.  This is especially problematic for
> the initial fetch aka "clone" as we will be talking about
> a large transfer. Redirection to a bundle hosted on CDN
> might be something we could do transparently.
> 
>  - The protocol extension has a fairly low length limit.
> 
>  - Because the protocol exchange starts by the server side
> advertising all its refs, even when the fetcher is
> interested in a single ref, the initial overhead is
> nontrivial, especially when you are doing a small
> incremental update.  The worst case is an auto-builder
> that polls every five minutes, even when there is no new
> commits to be fetched.

A lot of focus about the problems with ref advertisement is 
about the obvious problem mentioned above (a bad problem 
indeed).  I would like to add that there is another related 
problem that all potential solutions to the above problem do 
not neccessarily improve.   When polling regularly there is 
also no current efficient way to check on the current state of 
all refs.  It would be nice to also be able to get an 
incremental update on large refs spaces.

Thanks,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-04-02 22:18                                       ` Martin Fick
@ 2015-04-02 22:58                                         ` Junio C Hamano
  2015-04-02 23:00                                         ` Stefan Beller
  1 sibling, 0 replies; 80+ messages in thread
From: Junio C Hamano @ 2015-04-02 22:58 UTC (permalink / raw)
  To: Martin Fick; +Cc: git, Duy Nguyen, Stefan Beller

Martin Fick <mfick@codeaurora.org> writes:

>> The current protocol has the following problems that limit
>> us:
>> 
>>  - It is not easy to make it resumable, because we
>> recompute every time.  This is especially problematic for
>> the initial fetch aka "clone" as we will be talking about
>> a large transfer. Redirection to a bundle hosted on CDN
>> might be something we could do transparently.
>> 
>>  - The protocol extension has a fairly low length limit.
>> 
>>  - Because the protocol exchange starts by the server side
>> advertising all its refs, even when the fetcher is
>> interested in a single ref, the initial overhead is
>> nontrivial, especially when you are doing a small
>> incremental update.  The worst case is an auto-builder
>> that polls every five minutes, even when there is no new
>> commits to be fetched.
>
> A lot of focus about the problems with ref advertisement is 
> about the obvious problem mentioned above (a bad problem 
> indeed).  I would like to add that there is another related 
> problem that all potential solutions to the above problem do 
> not neccessarily improve.   When polling regularly there is 
> also no current efficient way to check on the current state of 
> all refs.  It would be nice to also be able to get an 
> incremental update on large refs spaces.

Yes, incremental ref update is an important problem to solve.  I
think one potential solution was indeed mentioned to improve that
exact issue, e.g. footnote #3 in $gmane/264000.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-04-02 22:18                                       ` Martin Fick
  2015-04-02 22:58                                         ` Junio C Hamano
@ 2015-04-02 23:00                                         ` Stefan Beller
  2015-04-02 23:14                                           ` Stefan Beller
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Beller @ 2015-04-02 23:00 UTC (permalink / raw)
  To: Martin Fick; +Cc: Junio C Hamano, git@vger.kernel.org, Duy Nguyen

On Thu, Apr 2, 2015 at 3:18 PM, Martin Fick <mfick@codeaurora.org> wrote:
>> The current protocol has the following problems that limit
>> us:
>>
>>  - It is not easy to make it resumable, because we
>> recompute every time.  This is especially problematic for
>> the initial fetch aka "clone" as we will be talking about
>> a large transfer. Redirection to a bundle hosted on CDN
>> might be something we could do transparently.
>>
>>  - The protocol extension has a fairly low length limit.
>>
>>  - Because the protocol exchange starts by the server side
>> advertising all its refs, even when the fetcher is
>> interested in a single ref, the initial overhead is
>> nontrivial, especially when you are doing a small
>> incremental update.  The worst case is an auto-builder
>> that polls every five minutes, even when there is no new
>> commits to be fetched.
>
> A lot of focus about the problems with ref advertisement is
> about the obvious problem mentioned above (a bad problem
> indeed).  I would like to add that there is another related
> problem that all potential solutions to the above problem do
> not neccessarily improve.   When polling regularly there is
> also no current efficient way to check on the current state of
> all refs.  It would be nice to also be able to get an
> incremental update on large refs spaces.

I think once the new protocol is in place, the server could advertise
the capability to send a differential of refs.

To make sure that works the capability phase should be strictly separated
from the rest, so you can think of any new fancy scheme to transmit
refs or objects, and once both client and server agree on that fancy scheme
both know when to expect the "new changed" protocol.

So from a high level perspective it should look like:
Phase 1) negotiation of capabilities
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.

The crucial point now is to make sure Phase 1) is not growing to large in
transmission size / required compute power (/ complexity).

And as everybody out there wants to invent new schemes how to do 2) and 3)
efficient, I wonder if we need to do Phase 1) as a differential as well, so I'd
presume the optimum could look like

Client: Last time we talked the capabilities you advertised hashed to $SHA
Server: That's right, but additionally I have "push_cert_nonce=$value"

In the non-optimal case:
Client: Last time we talked the capabilities you advertised hashed to $SHA
Server: I don't know that value, here comes the list of all
capabilities I can do:
 ...
 ...

I like that approach as it would really break down to transmitting the minimal
amount of information most of the time. The downside is to know which
capabilities are cache-able and then hash-able, such that the remote side
only needs to maintain only a very small set of advertised capability lists
and their hash. For example the nonce for signed pushes will hopefully
never be the same, so it makes no sense to have them inside the capabilities
cache.

Having such a capabilities cache would give us a long time until the
phase to negotiate the capabilities will grow too large again (most of the
capabilities I'd assume are rather static per server)

And the way I understand the current situation, it's all about talking this
early negotiation phase, which then allows us to model the refs
advertisement and
the blob transmission later on as a response to upcoming problems in the future.

>
> Thanks,
>
> -Martin
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code
> Aurora Forum, hosted by The Linux Foundation
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH] protocol upload-pack-v2
  2015-04-02 23:00                                         ` Stefan Beller
@ 2015-04-02 23:14                                           ` Stefan Beller
  0 siblings, 0 replies; 80+ messages in thread
From: Stefan Beller @ 2015-04-02 23:14 UTC (permalink / raw)
  To: Martin Fick; +Cc: Junio C Hamano, git@vger.kernel.org, Duy Nguyen

After looking at $gmane/264000 again, maybe the client should talk first
stating all the relevant information it wants to get, though I realize this
is not part of capabilities so maybe it could even before, such as:

Client: All I want to do is an ls-remote, so only Phase 2, no
transmission of blobs phase 3
Server: ok
Client[as in the previous mail]: Last time we talked you advertised
hashed to $SHA
Server: that's correct!

As the server knows the client doesn't want to know about Phase3, it
can omit things
relevant to that phase such as the signed push nonce.

So from a high level perspective we'd maybe need 4 phases like
Phase 0) declare the intent (fetch/push all or partial parts)
Phase 1) negotiation of capabilities
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.

The problem may be that phase 0 and 1 may require mixing, as you may want
to declare new things to do in 0) which you would have needed to advertise as
a capability in 1). So maybe we need to swap that around:

Phase 1a) negotiation of capabilities
Phase 1b) negotiation of intent (fetch/push of all/few branches in
full/shallow/narrow fashion)
Phase 2) ref advertisement (i.e. changes in the DAG end points)
Phase 3) transmitting the missing blobs.

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2015-04-02 23:14 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-24  3:12 [RFC/PATCH 0/3] protocol v2 Stefan Beller
2015-02-24  3:12 ` [PATCH 1/3] Document protocol capabilities extension Stefan Beller
2015-02-24  3:12 ` [PATCH 2/3] receive-pack: add advertisement of different protocol options Stefan Beller
2015-02-24  3:12 ` [PATCH 3/3] receive-pack: enable protocol v2 Stefan Beller
2015-02-24  4:02 ` [RFC/PATCH 0/3] " Duy Nguyen
2015-02-24  5:40   ` Stefan Beller
2015-02-24  6:15   ` Junio C Hamano
2015-02-24 23:37     ` Stefan Beller
2015-02-25 12:44       ` Duy Nguyen
2015-02-25 18:04         ` Junio C Hamano
2015-02-26  7:31           ` Stefan Beller
2015-02-26 10:15             ` Duy Nguyen
2015-02-26 20:08               ` Stefan Beller
     [not found]                 ` <CACsJy8DOS_999ZgW7TqsH-dkrUFpjZf0TFQeFUt9s0bNhHY0Bw@mail.gmail.com>
2015-02-27 22:20                   ` Stefan Beller
2015-02-26 20:13               ` Junio C Hamano
2015-02-27  1:26                 ` Stefan Beller
2015-02-27  2:15                   ` Stefan Beller
2015-02-27 23:05                 ` Junio C Hamano
2015-02-27 23:44                   ` Stefan Beller
2015-02-28  0:33                     ` Junio C Hamano
2015-02-28  0:46                       ` Stefan Beller
2015-02-28  1:01                         ` [RFC/PATCH 0/5] protocol v2 for upload-pack Stefan Beller
2015-02-28  1:01                           ` [RFC/PATCH 1/5] upload-pack: only accept capabilities on the first "want" line Stefan Beller
2015-02-28  1:01                           ` [RFC/PATCH 2/5] upload-pack: support out of band client capability requests Stefan Beller
2015-02-28  7:47                             ` Kyle J. McKay
2015-02-28 11:22                               ` Duy Nguyen
2015-02-28 22:36                                 ` Kyle J. McKay
2015-03-01  0:11                                   ` Duy Nguyen
2015-02-28 11:36                             ` Duy Nguyen
2015-02-28  1:01                           ` [RFC/PATCH 3/5] connect.c: connect to a remote service with some flags Stefan Beller
2015-02-28 11:11                             ` Torsten Bögershausen
2015-03-01  3:22                             ` Junio C Hamano
2015-02-28  1:01                           ` [RFC/PATCH 4/5] daemon.c: accept extra service arguments Stefan Beller
2015-03-01  3:39                             ` Junio C Hamano
2015-02-28  1:01                           ` [RFC/PATCH 5/5] WIP/Document the http protocol change Stefan Beller
2015-02-28 12:26                             ` Duy Nguyen
2015-03-01  9:11                           ` [RFC/PATCH 0/5] protocol v2 for upload-pack Johannes Sixt
2015-03-02  2:58                             ` Junio C Hamano
2015-03-02  3:47                         ` [RFC/PATCH 0/3] protocol v2 Junio C Hamano
2015-03-02  9:21                           ` Duy Nguyen
2015-03-02  9:24                             ` Duy Nguyen
2015-03-03 10:33                             ` Duy Nguyen
2015-03-03 17:13                               ` Junio C Hamano
2015-03-03 19:47                                 ` Junio C Hamano
2015-03-04  1:54                                 ` Duy Nguyen
2015-03-04  4:27                                   ` Shawn Pearce
2015-03-04 12:05                                     ` Duy Nguyen
2015-03-04 19:10                                       ` Shawn Pearce
2015-03-05  1:03                                         ` Stefan Beller
2015-03-05 16:03                                           ` Stefan Beller
2015-03-24 17:42                                 ` Stefan Beller
2015-03-24 18:00                                   ` Junio C Hamano
2015-03-06 23:38                             ` [PATCH] protocol upload-pack-v2 Stefan Beller
2015-03-06 23:40                               ` Stefan Beller
2015-03-06 23:55                               ` Duy Nguyen
2015-03-07  0:00                               ` Duy Nguyen
2015-03-07  0:28                               ` Junio C Hamano
2015-03-07  4:28                                 ` Stefan Beller
2015-03-07  5:21                                   ` Duy Nguyen
2015-03-08 20:36                                   ` Junio C Hamano
2015-03-31 19:58                                     ` Junio C Hamano
2015-04-02 12:37                                       ` Duy Nguyen
2015-04-02 14:45                                         ` Junio C Hamano
2015-04-02 22:18                                       ` Martin Fick
2015-04-02 22:58                                         ` Junio C Hamano
2015-04-02 23:00                                         ` Stefan Beller
2015-04-02 23:14                                           ` Stefan Beller
2015-03-10  1:38                               ` Duy Nguyen
2015-03-10 19:36                                 ` Kyle J. McKay
2015-02-28  0:07                   ` [RFC/PATCH 0/3] protocol v2 Duy Nguyen
2015-02-28  0:26                     ` Junio C Hamano
2015-03-01  8:41     ` Junio C Hamano
2015-03-01 11:32       ` Duy Nguyen
2015-03-01 19:56         ` Stefan Beller
2015-03-02  1:05           ` David Lang
2015-03-01 23:06         ` Junio C Hamano
2015-03-02  1:09           ` David Lang
2015-03-02  3:10             ` Junio C Hamano
2015-03-01 23:06       ` Philip Oakley
2015-03-02  9:32         ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).