git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] RFC/Add documentation for version protocol 2
@ 2015-04-21 23:19 Stefan Beller
  2015-04-22 19:19 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Beller @ 2015-04-21 23:19 UTC (permalink / raw)
  To: git; +Cc: gitster, mfick, pclouds, Stefan Beller

This adds the design document for protocol version 2.
It's better to rewrite the design document instead of trying to
squash it into the existing pack-protocol.txt and then differentiating
between version 1 and 2 all the time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
  
 As we discussed at Git Merge in Paris, I'd just start out implementing the new
 protocol "to deliver something you can play around with". Unfortunately I
 did not come up with an implementation straight away, but I think we should be
 coming to a consensus on the rough design at first. If there are no objections
 in the design I'll go for implementation.

 Documentation/technical/pack-protocol-2.txt | 88 +++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 Documentation/technical/pack-protocol-2.txt

diff --git a/Documentation/technical/pack-protocol-2.txt b/Documentation/technical/pack-protocol-2.txt
new file mode 100644
index 0000000..36ddf3e
--- /dev/null
+++ b/Documentation/technical/pack-protocol-2.txt
@@ -0,0 +1,88 @@
+Packfile transfer protocols version 2
+=====================================
+
+This document describes an updated protocol to transfer packs over ssh://,
+git:// and file:// links. All three transports (ssh, git, file) use the same
+protocol to transfer data. This document describes the version 2 of the pack
+file protocol, which is incompatible with the previous pack protocol.
+
+The http:// transport is not yet thought about in this phase of the protocol
+design.
+
+As this protocol is introduced rather late in the game, just after Gits 10th
+anniversary, a client SHOULD NOT assume a server speaks protocol version 2
+unless the server advertised protocol in a prior exchange.
+
+General structure
+=================
+
+There are four phases involved in the protocol, which are described below:
+
+1) capability negotiation
+2) goal annoncement
+3) reference advertisement
+4) pack transfer
+
+
+1) Capability negotiation
+-------------------------
+
+In this phase both client and server send their capabilities to the other side
+using the following protocol:
+
+---
+list-of-capabilities = *capability flush-pkt
+capability           =  PKT-LINE(1*(LC_ALPHA / DIGIT / "-" / "_"))
+----
+
+The capabilities itself are described in protocol-capabilities.txt
+Sending the capabilities to the other side MAY happen concurrently or
+one after another. There is no order who sends first.
+
+Note for developers:
+The amount of data SHOULD be kept very small. Future extensions to the protocol
+SHOULD only add a capability flag to this phase, adding possible data
+transfers into later phases. This ensures the protocol is extendable over
+time without having to spent to send huge chunks of data in the first phase.
+If both sides agree on a certain feature being used, it is easy to introduce more
+phases at any convenieant point after the phase 1 is finished.
+
+Notes as a design rationale:
+I thought about caching
+https://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf
+
+2) Goal annoncement
+-------------------
+
+The goal of this phase is for the client to tell the server what
+outcome it expects from this communication, such as pushing or
+pulling data from the server.
+
+---
+list-of-goals    = *goal
+goal             = PKT-LINE(action-line)
+action-line      = action *(SP action-parameter)
+action           = "noop" / "ls-remote" / "fetch" / "push" / "fetch-shallow"
+action-parameter = parameter-key *("=" parameter-value)
+parameter-key    = 1*(LC_ALPHA / DIGIT / "-" / "_")
+---
+
+You MAY specify multiple goals such as fetch and push or fetch-shallow.
+You MAY also specify the same goal multiple times with different parameters.
+You MUST omit goals which are part of other goals, such as ls-remote being part
+of fetch.
+
+The action parameter is dependant on the action itself. For now only fetch and push
+take the parameter "mode", whose only allowed value is "version1".
+
+Note:
+The parameters should follow a key=value pattern, where the value can consist of
+arbitrary characters. Having such a pattern would allow us to easily add a new
+capability for narrow clones (e.g. "fetch-narrow=Documentation/*,.git*,.mailmap"
+to fetch only the Documentation and .gitignore/attributes)
+
+3) Ref advertisement
+--------------------
+3) and 4) are highly dependant on the mode for fetch and push. As currently
+only mode "version1" is supported, the these phases follow the ref advertisement
+in pack protocol version 1 without capabilities on the first line of the refs.
-- 
2.4.0.rc2.5.g4c2045b.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC/Add documentation for version protocol 2
  2015-04-21 23:19 [PATCH] RFC/Add documentation for version protocol 2 Stefan Beller
@ 2015-04-22 19:19 ` Junio C Hamano
  2015-04-22 19:43   ` Stefan Beller
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2015-04-22 19:19 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, mfick, pclouds

Stefan Beller <sbeller@google.com> writes:

> This adds the design document for protocol version 2.
> It's better to rewrite the design document instead of trying to
> squash it into the existing pack-protocol.txt and then differentiating
> between version 1 and 2 all the time.

Just a handful of random thoughts, without expressing agreement or
objection at this point.

> diff --git a/Documentation/technical/pack-protocol-2.txt

I wonder, if we are really revamping, if we want this to be limited
to "pack" protocol (see more below).

> +General structure
> +=================
> +
> +There are four phases involved in the protocol, which are described below:
> +
> +1) capability negotiation
> +2) goal annoncement
> +3) reference advertisement
> +4) pack transfer
> +
> +
> +1) Capability negotiation
> +-------------------------
> +
> +In this phase both client and server send their capabilities to the other side
> +using the following protocol:
> +
> +---
> +list-of-capabilities = *capability flush-pkt
> +capability           =  PKT-LINE(1*(LC_ALPHA / DIGIT / "-" / "_"))
> +----
> +
> +The capabilities itself are described in protocol-capabilities.txt
> +Sending the capabilities to the other side MAY happen concurrently or
> +one after another. There is no order who sends first.

Doesn't that set us up for an easy deadlock (i.e. both sides fill
their outgoing pipe because they are not listening)?

> +2) Goal annoncement
> +-------------------
> +
> +The goal of this phase is for the client to tell the server what
> +outcome it expects from this communication, such as pushing or
> +pulling data from the server.
> +
> +---
> +list-of-goals    = *goal
> +goal             = PKT-LINE(action-line)
> +action-line      = action *(SP action-parameter)
> +action           = "noop" / "ls-remote" / "fetch" / "push" / "fetch-shallow"

This is interesting, in that it implies that you can connect to a
service and after learning what is running on the other hand then
decide you would be fetching or pushing.  Which is *never* be
possible with v1 where you first connect to a specific service that
knows how to handle "fetch".

If we are going in this "in-protocol message switches the service"
route, we should also support "archive" as one of the actions, no?
Yes, I know you named the document "pack-protocol" and "archive"
does not give you packs, but "ls-remote" does not transfer pack data,
either.

And when we add "archive" (and later "refer to bundle on CDN" to
help initial clone), it would become clear that steps #3 and #4 are
optional components that are shared by majority of the protocol
users (i.e. fetch, push, ls-remote uses #3, fetch, push uses #4.),
and other services that also use the protocol need their own
equivalents for steps #3 and #4.

Of course, we do not have to do it this way and stick to "one 'goal'
per service is pre-arranged before the protocol exchange happens,
either via git-daemon or ssh shell command line interactiosn" way of
doing things we have always done in v1 protocol.

I have to wonder what role, if any, should "git daemon" (and its
equivalent, "the shell command line", if the transport is over ssh)
play in this new world order.

Note again that I am not objecting. I am trying to fathom the
ramifications of what you wrote here.

> +3) Ref advertisement
> +--------------------
> +3) and 4) are highly dependant on the mode for fetch and push. As currently
> +only mode "version1" is supported, the these phases follow the ref advertisement
> +in pack protocol version 1 without capabilities on the first line of the refs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC/Add documentation for version protocol 2
  2015-04-22 19:19 ` Junio C Hamano
@ 2015-04-22 19:43   ` Stefan Beller
  2015-04-22 23:30     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Beller @ 2015-04-22 19:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Martin Fick, Duy Nguyen

On Wed, Apr 22, 2015 at 12:19 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This adds the design document for protocol version 2.
>> It's better to rewrite the design document instead of trying to
>> squash it into the existing pack-protocol.txt and then differentiating
>> between version 1 and 2 all the time.
>
> Just a handful of random thoughts, without expressing agreement or
> objection at this point.
>
>> diff --git a/Documentation/technical/pack-protocol-2.txt
>
> I wonder, if we are really revamping, if we want this to be limited
> to "pack" protocol (see more below).
>
>> +General structure
>> +=================
>> +
>> +There are four phases involved in the protocol, which are described below:
>> +
>> +1) capability negotiation
>> +2) goal annoncement
>> +3) reference advertisement
>> +4) pack transfer
>> +
>> +
>> +1) Capability negotiation
>> +-------------------------
>> +
>> +In this phase both client and server send their capabilities to the other side
>> +using the following protocol:
>> +
>> +---
>> +list-of-capabilities = *capability flush-pkt
>> +capability           =  PKT-LINE(1*(LC_ALPHA / DIGIT / "-" / "_"))
>> +----
>> +
>> +The capabilities itself are described in protocol-capabilities.txt
>> +Sending the capabilities to the other side MAY happen concurrently or
>> +one after another. There is no order who sends first.
>
> Doesn't that set us up for an easy deadlock (i.e. both sides fill
> their outgoing pipe because they are not listening)?

I did not think of it that way, but rather was focused on wall clock
time spent waiting for the protocol to be finished. And then we want to have
as much concurrent as possible. I don't know if we ever want to touch threads
in git.

>
>> +2) Goal annoncement
>> +-------------------
>> +
>> +The goal of this phase is for the client to tell the server what
>> +outcome it expects from this communication, such as pushing or
>> +pulling data from the server.
>> +
>> +---
>> +list-of-goals    = *goal
>> +goal             = PKT-LINE(action-line)
>> +action-line      = action *(SP action-parameter)
>> +action           = "noop" / "ls-remote" / "fetch" / "push" / "fetch-shallow"
>
> This is interesting, in that it implies that you can connect to a
> service and after learning what is running on the other hand then
> decide you would be fetching or pushing.  Which is *never* be
> possible with v1 where you first connect to a specific service that
> knows how to handle "fetch".

I originally thought about it as an optimisation. Say you only want to do
a ls-remote, you don't need to start pack file creation (possibly in a
background thread?), but you know what is coming and don't need to
prepare for unknown things.

>
> If we are going in this "in-protocol message switches the service"
> route, we should also support "archive" as one of the actions, no?
> Yes, I know you named the document "pack-protocol" and "archive"
> does not give you packs, but "ls-remote" does not transfer pack data,
> either.

I'll add that. Also I need to incorporate shallow in one way or another.

>
> And when we add "archive" (and later "refer to bundle on CDN" to
> help initial clone), it would become clear that steps #3 and #4 are
> optional components that are shared by majority of the protocol
> users (i.e. fetch, push, ls-remote uses #3, fetch, push uses #4.),
> and other services that also use the protocol need their own
> equivalents for steps #3 and #4.

That is my thinking as well, #3 and following are completely dependent
on the action we negotiated. Just thinking about that we could do that
also with the current protocol by invoking not just {receive, upload}-pack
but any other program on the server side.

>
> Of course, we do not have to do it this way and stick to "one 'goal'
> per service is pre-arranged before the protocol exchange happens,
> either via git-daemon or ssh shell command line interactiosn" way of
> doing things we have always done in v1 protocol.
>
> I have to wonder what role, if any, should "git daemon" (and its
> equivalent, "the shell command line", if the transport is over ssh)
> play in this new world order.

So I guess you can still use a daemon to fetch from, and by now
you could also do the authentication with git daemon (with push
certificates)

What I did not talk about in the proposal is the receiving end point.
So I think there may be a git-protocol-2 binary similar to
git-{receive, upload}-pack which you then invoke via ssh?

>
> Note again that I am not objecting. I am trying to fathom the
> ramifications of what you wrote here.

Thanks for pointing out ramifications I did not think of yet!
What this new protocol is all about is the future flexibility,
so I think it is good to have lots of possibilities available.

(So for example with having 2 "goals" as above inside one
protocol exchange, you could also do a partially narrow/shallow
clone. So shallow for the whole repository, but deepened for
a narrow directory you're really interested in. I am not saying
this comes live in the near future, but it is possible to implement
using this protocol and still have a good compression with the
packfile format as of now?)

My biggest concern is to get the phase 1 somewhat right this time
(an exchange which doesn't grow as large as the current refs
advertisement but still has enough information to be able to change
the protocol 2 years down the road without this upgrade pain of old
and new programs talking to each other, still working without a failure).

Thanks for any input!
Stefan

>
>> +3) Ref advertisement
>> +--------------------
>> +3) and 4) are highly dependant on the mode for fetch and push. As currently
>> +only mode "version1" is supported, the these phases follow the ref advertisement
>> +in pack protocol version 1 without capabilities on the first line of the refs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC/Add documentation for version protocol 2
  2015-04-22 19:43   ` Stefan Beller
@ 2015-04-22 23:30     ` Junio C Hamano
  2015-04-23  6:16       ` Stefan Beller
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2015-04-22 23:30 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Martin Fick, Duy Nguyen

Stefan Beller <sbeller@google.com> writes:

>>> +action           = "noop" / "ls-remote" / "fetch" / "push" / "fetch-shallow"
> ...
>> If we are going in this "in-protocol message switches the service"
>> route, we should also support "archive" as one of the actions, no?
>> Yes, I know you named the document "pack-protocol" and "archive"
>> does not give you packs, but "ls-remote" does not transfer pack data,
>> either.
>
> I'll add that. Also I need to incorporate shallow in one way or another.

This level of detail may not matter at this point yet, but it is
unclear to me why you have "fetch-shallow" as a separate thing
(while not having "push-shallow").  The current infrastructure does
already allow fetching into shallow repositories witout needing a
separate action that is different from "fetch" (aka "upload-pack").
I would not be surprised if it were "I can deepn you if you want"
capability, but I do not understand why you are singling out
"shallow" as something that needs such a special treatment.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RFC/Add documentation for version protocol 2
  2015-04-22 23:30     ` Junio C Hamano
@ 2015-04-23  6:16       ` Stefan Beller
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Beller @ 2015-04-23  6:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Martin Fick, Duy Nguyen

On Wed, Apr 22, 2015 at 4:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>>> +action           = "noop" / "ls-remote" / "fetch" / "push" / "fetch-shallow"
>> ...
>>> If we are going in this "in-protocol message switches the service"
>>> route, we should also support "archive" as one of the actions, no?
>>> Yes, I know you named the document "pack-protocol" and "archive"
>>> does not give you packs, but "ls-remote" does not transfer pack data,
>>> either.
>>
>> I'll add that. Also I need to incorporate shallow in one way or another.
>
> This level of detail may not matter at this point yet, but it is
> unclear to me why you have "fetch-shallow" as a separate thing
> (while not having "push-shallow").

Right, this should have been done via plain fetch action but the mode parameter
may be set to shallow,narrow or what we want. Sorry for my shortcut in thinking
there.

> The current infrastructure does
> already allow fetching into shallow repositories without needing a
> separate action that is different from "fetch" (aka "upload-pack").
> I would not be surprised if it were "I can deepn you if you want"
> capability, but I do not understand why you are singling out
> "shallow" as something that needs such a special treatment.
>

I should not have done that. I just got confused.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-23  6:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-21 23:19 [PATCH] RFC/Add documentation for version protocol 2 Stefan Beller
2015-04-22 19:19 ` Junio C Hamano
2015-04-22 19:43   ` Stefan Beller
2015-04-22 23:30     ` Junio C Hamano
2015-04-23  6:16       ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).