git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [RFC] protocol version 2
@ 2017-10-20 17:18 Brandon Williams
  2017-10-24  6:48 ` Junio C Hamano
                   ` (4 more replies)
  0 siblings, 5 replies; 47+ messages in thread
From: Brandon Williams @ 2017-10-20 17:18 UTC (permalink / raw)
  To: git
  Cc: Brandon Williams, spearce, git, gitster, jonathantanmy, jrnieder,
	peff, sbeller

 Objective
===========

Replace Git's current wire protocol with a simpler, less wasteful
protocol that can evolve over time.

 Background
============

Git's wire protocol is the language used to clone/fetch/push from/to a
remote git repository.  A detailed explanation of the current protocol
spec can be found
[here](https://git.kernel.org/pub/scm/git/git.git/tree/Documentation/technical/pack-protocol.txt).
Some of the pain points with the current protocol spec are:

  * The server's initial response is the ref advertisement.  This
    advertisement cannot be omitted and can become an issue due to the
    sheer number of refs that can be sent with large repositories.  For
    example, when contacting the internal equivalent of
    `https://android.googlesource.com/`, the server will send
    approximately 1 million refs totaling 71MB.  This is data that is
    sent during each and every fetch and is not scalable.

  * Capabilities were implemented as a hack and are hidden behind a NUL
    byte after the first ref sent from the server during the ref
    advertisement:

	<SHA1> <Ref Name>\0<capabilities space separated> <symref> <agent>

    Since they are sent in the context of a pkt-line they are also subject
    to the same length limitations (1k bytes with old clients).  While we
    may not be close to hitting this limitation with capabilities alone, it
    has become a problem when trying to abuse capabilities for other
    purposes (e.g. [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)).

  * Various other technical debt (e.g. abusing capabilities to
    communicate agent and symref data, service name set using a query
    parameter).

 Overview
==========

This document presents a specification for a version 2 of Git's wire
protocol.  Protocol v2 will improve upon v1 in the following ways:

  * Instead of multiple service names, multiple commands will be
    supported by a single service
  * Easily extendable as capabilities are moved into their own section
    of the protocol, no longer being hidden behind a NUL byte and
    limited by the size of a pkt-line (as there will be a single
    capability per pkt-line).
  * Separate out other information hidden behind NUL bytes (e.g. agent
    string as a capability and symrefs can be requested using 'ls-ref')
  * Ref advertisement will be omitted unless explicitly requested
  * ls-ref command to explicitly request some refs

 Detailed Design
=================

A client can request to speak protocol v2 by sending `version=2` in the
side-channel `GIT_PROTOCOL` in the initial request to the server.

In protocol v2 communication is command oriented.  When first contacting
a server a list of capabilities will advertised.  Some of these
capabilities will be commands which a client can request be executed.
Once a command has completed, a client can reuse the connection and
request that other commands be executed

 Special Packets
-----------------

In protocol v2 these special packets will have the following semantics:

  * '0000' Flush Packet (flush-pkt) - indicates the end of a message
  * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

 Capability Advertisement
--------------------------

A server which decides to communicate (based on a request from a client)
using protocol version 2, notifies the client by sending a version
string in its initial response followed by an advertisement of its
capabilities.  Each capability is a key with an optional value.  Clients
must ignore all unknown keys.  Semantics of unknown values are left to
the definition of each key.  Some capabilities will describe commands
which can be requested to be executed by the client.

    capability-advertisement = protocol-version
                               capability-list
                               flush-pkt

    protocol-version = PKT-LINE("version 2" LF)
    capability-list = *capability
    capability = PKT-LINE(key[=value] LF)

    key = 1*CHAR
    value = 1*CHAR
    CHAR = 1*(ALPHA / DIGIT / "-" / "_")

A client then responds to select the command it wants with any
particular capabilities or arguments.  There is then an optional section
where the client can provide any command specific parameters or queries.

    command-request = command
                      capability-list
                      delim-pkt
                      (command specific parameters)
                      flush-pkt
    command = PKT-LINE("command=" key LF)

The server will then acknowledge the command and requested capabilities
by echoing them back to the client and then launch into the command.

    acknowledge-request = command
                          capability-list
                          delim-pkt
                          execute-command
    execute-command = <defined by each command>


A particular command can last for as many rounds as are required to
complete the service (multiple for negotiation during fetch and push or
no additional trips in the case of ls-refs).


 Commands in v2
~~~~~~~~~~~~~~~~

Services are the core actions that a client wants to perform (fetch,
push, etc).  Each service has its own set of capabilities and its own
language of commands (think 'want' lines in fetch).  Optionally a
service can take in initial parameters or data when a client sends it
service request.

 Ls-refs
---------

Ls-refs can be looked at as the equivalent of the current ls-remote as
it is a way to query a remote for the references that it has.  Unlike
the current ls-remote, the filtering of the output is done on the server
side by passing a number of parameters to the server-side command
instead of the filtering occurring on the client.

Ls-ref takes in the following parameters:

  --head, --tags: Limit to only refs/heads or refs/tags
  --refs: Do not show peeled tags or pseudorefs like HEAD
  --symref: In addition to the object pointed by it, show the underlying
            ref pointed by it when showing a symbolic ref
  <refspec>: When specified, only references matching the given patterns
             are displayed.

The output of ls-refs is as follows:

    output = (no-refs / list-of-refs)
	     *symref
             *shallow
             flush-pkt

    no-refs = PKT-LINE(zero-id SP no-refs LF)
    list-of-refs = *ref
    ref = PKT-LINE((tip / peeled) LF)
    tip = obj-id SP refname
    peeled = obj-id SP refname "^{}"

    symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
    shallow = PKT-LINE("shallow" SP obj-id LF)

 Fetch
-------

Fetch will need to be a modified version of the v1 fetch protocol.  Some
potential areas for improvement are: Ref-in-want, CDN offloading,
Fetch-options.

Since we'll have an 'ls-ref' service we can eliminate the need of fetch
to perform a ref-advertisement, instead a client can run the 'ls-refs'
service first, in order to find out what refs the server has, and then
request those refs directly using the fetch service.

//TODO Flush out the design

 Fetch-object
--------------

This service could be used by partial clones in order to request missing
objects.

//TODO Flush out the design

 Push
------

Push will need to be a modified version of the v1 push protocol.  Some
potential areas for improvement are: Fix push-options, Negotiation for
force push.

One change that will need to happen is to improve how `push-options` are
sent to the server (so that they aren't sent twice!!).  Also the
report-status needs to be better than it currently is in v1 so that
tools like gerrit can explain what it did with the ref-update the client
sent to it. Maybe have a push-rebase capability or command?

//TODO Flush out the design

 Other Considerations
======================

  * Move away from pkt-line framing?
  * Have responses structured in well known formats (e.g. JSON)
  * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
  * Additional commands in a partial clone world (e.g. log, grep)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-20 17:18 [RFC] protocol version 2 Brandon Williams
@ 2017-10-24  6:48 ` Junio C Hamano
  2017-10-24 18:35   ` Brandon Williams
  2017-10-25 13:09 ` Derrick Stolee
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-10-24  6:48 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, spearce, git, jonathantanmy, jrnieder, peff, sbeller

Brandon Williams <bmwill@google.com> writes:

>   * Capabilities were implemented as a hack and are hidden behind a NUL
>     byte after the first ref sent from the server during the ref
>     advertisement:
> ...
>
>   * Various other technical debt (e.g. abusing capabilities to
>     communicate agent and symref data, service name set using a query
>     parameter).

This sounds like a duplicate of the above.

>  Special Packets
> -----------------
>
> In protocol v2 these special packets will have the following semantics:
>
>   * '0000' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

After reading the remainder of the document twice, I do not think
the reason why we want to introduce a new type delim-pkt is
explained and justified well enough.  If a command request takes a
command packet, zero or more capability packets, a mandatory
delimiter packet, zero or more parameter packets and a mandatory
flush packet, then you can use the same "flush" as delimiter in the
middle.  The delimiter will of course become useful if you can omit
it when not necessary (e.g. after seeing capabilities, you may see a
flush and you will know there is no parameters and save the need to
send one "delim").

I actually have a reasonable guess why you want to have a separate
delimiter (which has nothing to do with "optional delim can be
omitted"), but I want to see it explained in this document clearly
by its designer(s).

>     command-request = command
>                       capability-list
>                       delim-pkt
>                       (command specific parameters)
>                       flush-pkt
>     command = PKT-LINE("command=" key LF)
>
> The server will then acknowledge the command and requested capabilities
> by echoing them back to the client and then launch into the command.
>
>     acknowledge-request = command
>                           capability-list
>                           delim-pkt
>                           execute-command
>     execute-command = <defined by each command>

It is not quite clear what the value of "echoing them back" is,
especially if that is done by always echoing verbatim.  A reader may
naturally expect, when capabilities are exchanged between two
parties, these are negotiated so that the only ones that are
commonly supported by both ends would be used, or something like
that.

> A particular command can last for as many rounds as are required to
> complete the service (multiple for negotiation during fetch and push or
> no additional trips in the case of ls-refs).

OK.

>  Commands in v2
> ~~~~~~~~~~~~~~~~
>
> Services are the core actions that a client wants to perform (fetch,
> push, etc).  Each service has its own set of capabilities and its own
> language of commands (think 'want' lines in fetch).  Optionally a
> service can take in initial parameters or data when a client sends it
> service request.

So a service (like "fetch") employ a set of "command"s (like "want",
"have, etc)?  In the earlier part of the document, we did not see
any mention of "service" and instead saw only "command" mentioned.
Is the state machine on both ends and the transition between states
implicit?  E.g. when one side throws "want" command and the other
side acknowledges, they understand implicitly that they are now in a
"fetch" service session, even though there is nothing that said over
the wire that they are doing so?  One reason I am wondering about
this is what we should do if a command verb may be applicable in
multiple services.

After reading the earlier protocol exchange explanation, I was sort
of expecting that "fetch" would be the command and "want", "have",
and "ack" lines would be exchanged as "command specific parameters",
so a sudden introduction of "services" here was a bit of impedance
mismatch to me.

>  Ls-refs
> ---------
>
> Ls-refs can be looked at as the equivalent of the current ls-remote as
> it is a way to query a remote for the references that it has.  Unlike
> the current ls-remote, the filtering of the output is done on the server
> side by passing a number of parameters to the server-side command
> instead of the filtering occurring on the client.
>
> Ls-ref takes in the following parameters:
>
>   --head, --tags: Limit to only refs/heads or refs/tags

I see no need for the above two as long as "refs/heads/*", etc. are
understood.

>   --refs: Do not show peeled tags or pseudorefs like HEAD

So showing peeled tags is the default?  Then I can sort-of see why
"I am not interested in peeled result".  

I do not see why "do not show HEAD, MERGE_HEAD, etc." is needed,
though.  It should be sufficient to just ask for refs/* if you are
interested only in other things, no?

>   --symref: In addition to the object pointed by it, show the underlying
>             ref pointed by it when showing a symbolic ref

Sort of OK--it probably is easier to always send this, as symbolic
refs are minority anyway, though.

>   <refspec>: When specified, only references matching the given patterns
>              are displayed.

I do not think you meant <refspec> here.

The side that is listing what it has has no reason to know what the
recipient plans to do with the result, so you must be only sending
the LHS of a refspec.  If your explanation says "given patterns",
then replace <refspec> with <pattern>.  Do not abuse a term that has
specific and established meaning for something else.

> The output of ls-refs is as follows:
>
>     output = (no-refs / list-of-refs)
> 	     *symref
>              *shallow
>              flush-pkt
>
>     no-refs = PKT-LINE(zero-id SP no-refs LF)

Can't your list-of-refs have 0 element?  I do not see why you need
no-refs here.  It's not like you need a dummy line, to the end of
which you need to append NUL plus hidden capabilities ;-)

>     list-of-refs = *ref
>     ref = PKT-LINE((tip / peeled) LF)
>     tip = obj-id SP refname
>     peeled = obj-id SP refname "^{}"
>
>     symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
>     shallow = PKT-LINE("shallow" SP obj-id LF)

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-24  6:48 ` Junio C Hamano
@ 2017-10-24 18:35   ` Brandon Williams
  2017-10-25  1:22     ` Junio C Hamano
  2017-10-26  0:59     ` Junio C Hamano
  0 siblings, 2 replies; 47+ messages in thread
From: Brandon Williams @ 2017-10-24 18:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, spearce, git, jonathantanmy, jrnieder, peff, sbeller

On 10/24, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> >   * Capabilities were implemented as a hack and are hidden behind a NUL
> >     byte after the first ref sent from the server during the ref
> >     advertisement:
> > ...
> >
> >   * Various other technical debt (e.g. abusing capabilities to
> >     communicate agent and symref data, service name set using a query
> >     parameter).
> 
> This sounds like a duplicate of the above.

You're right, it mostly is a duplication of that.

> 
> >  Special Packets
> > -----------------
> >
> > In protocol v2 these special packets will have the following semantics:
> >
> >   * '0000' Flush Packet (flush-pkt) - indicates the end of a message
> >   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list
> 
> After reading the remainder of the document twice, I do not think
> the reason why we want to introduce a new type delim-pkt is
> explained and justified well enough.  If a command request takes a
> command packet, zero or more capability packets, a mandatory
> delimiter packet, zero or more parameter packets and a mandatory
> flush packet, then you can use the same "flush" as delimiter in the
> middle.  The delimiter will of course become useful if you can omit
> it when not necessary (e.g. after seeing capabilities, you may see a
> flush and you will know there is no parameters and save the need to
> send one "delim").
> 
> I actually have a reasonable guess why you want to have a separate
> delimiter (which has nothing to do with "optional delim can be
> omitted"), but I want to see it explained in this document clearly
> by its designer(s).

Jonathan Tan suggested that we tighten flush semantics in a newer
protocol so that proxies are easier to work with.  Currently proxies
need to understand the protocol instead of simply waiting for a flush.

Also I've been told the smart http code is more complex because of the
current semantics of flush packets.

> 
> >     command-request = command
> >                       capability-list
> >                       delim-pkt
> >                       (command specific parameters)
> >                       flush-pkt
> >     command = PKT-LINE("command=" key LF)
> >
> > The server will then acknowledge the command and requested capabilities
> > by echoing them back to the client and then launch into the command.
> >
> >     acknowledge-request = command
> >                           capability-list
> >                           delim-pkt
> >                           execute-command
> >     execute-command = <defined by each command>
> 
> It is not quite clear what the value of "echoing them back" is,
> especially if that is done by always echoing verbatim.  A reader may
> naturally expect, when capabilities are exchanged between two
> parties, these are negotiated so that the only ones that are
> commonly supported by both ends would be used, or something like
> that.

The echoing back of the command and requested capabilities may or may
not be needed.  A client should only ever issue a command-request using
advertised capabilities and commands so there really isn't much
negotiation happening, just the server saying "here's what's on the
menu" and the client picking only from that menu.

Where the echoing back may be useful is if we wanted to (at some point)
eliminate this initial round trip of doing the capability advertisement
and then subsequent selection of capabilities and instead stuff that
information into the GIT_PROTOCOL side channel in the initial request.
That way the client could optimistically send capabilities that it
doesn't yet know if the server supports.  I thought this might be an
interesting idea if we really really didn't want to live with the extra
round trip.

> 
> > A particular command can last for as many rounds as are required to
> > complete the service (multiple for negotiation during fetch and push or
> > no additional trips in the case of ls-refs).
> 
> OK.
> 
> >  Commands in v2
> > ~~~~~~~~~~~~~~~~
> >
> > Services are the core actions that a client wants to perform (fetch,
> > push, etc).  Each service has its own set of capabilities and its own
> > language of commands (think 'want' lines in fetch).  Optionally a
> > service can take in initial parameters or data when a client sends it
> > service request.
> 
> So a service (like "fetch") employ a set of "command"s (like "want",
> "have, etc)?  In the earlier part of the document, we did not see
> any mention of "service" and instead saw only "command" mentioned.
> Is the state machine on both ends and the transition between states
> implicit?  E.g. when one side throws "want" command and the other
> side acknowledges, they understand implicitly that they are now in a
> "fetch" service session, even though there is nothing that said over
> the wire that they are doing so?  One reason I am wondering about
> this is what we should do if a command verb may be applicable in
> multiple services.

Looks like I missed changing the word "services" to "commands" here.  I
originally was using the term 'services' for things like 'fetch', 'push',
'ls-refs', etc. and decided for some reason to change to using the word
'commands'.  Naming things is hard, especially when you couldn't decide
on a name and end up with two! ;) 

> After reading the earlier protocol exchange explanation, I was sort
> of expecting that "fetch" would be the command and "want", "have",
> and "ack" lines would be exchanged as "command specific parameters",
> so a sudden introduction of "services" here was a bit of impedance
> mismatch to me.

You are absolutely correct in that i intended 'fetch' to be the command
and 'want', 'have', and 'ack' lines would be parameters.

> 
> >  Ls-refs
> > ---------
> >
> > Ls-refs can be looked at as the equivalent of the current ls-remote as
> > it is a way to query a remote for the references that it has.  Unlike
> > the current ls-remote, the filtering of the output is done on the server
> > side by passing a number of parameters to the server-side command
> > instead of the filtering occurring on the client.
> >
> > Ls-ref takes in the following parameters:
> >
> >   --head, --tags: Limit to only refs/heads or refs/tags
> 
> I see no need for the above two as long as "refs/heads/*", etc. are
> understood.
> 
> >   --refs: Do not show peeled tags or pseudorefs like HEAD
> 
> So showing peeled tags is the default?  Then I can sort-of see why
> "I am not interested in peeled result".  
> 
> I do not see why "do not show HEAD, MERGE_HEAD, etc." is needed,
> though.  It should be sufficient to just ask for refs/* if you are
> interested only in other things, no?
> 
> >   --symref: In addition to the object pointed by it, show the underlying
> >             ref pointed by it when showing a symbolic ref
> 
> Sort of OK--it probably is easier to always send this, as symbolic
> refs are minority anyway, though.
> 
> >   <refspec>: When specified, only references matching the given patterns
> >              are displayed.
> 
> I do not think you meant <refspec> here.
> 
> The side that is listing what it has has no reason to know what the
> recipient plans to do with the result, so you must be only sending
> the LHS of a refspec.  If your explanation says "given patterns",
> then replace <refspec> with <pattern>.  Do not abuse a term that has
> specific and established meaning for something else.

Yes, you're right i intended that to mean <pattern> instead so that the
client could send "refs/heads/*" or some other such pattern and have the
server limit its output.

All of these parameters I just pulled from the current ls-remote command
thinking that whatever filtering the client currently does can end up
being done on the server.  You've illustrated that most of them could
simply be done with apattern so maybe i was overthinking it :)

> 
> > The output of ls-refs is as follows:
> >
> >     output = (no-refs / list-of-refs)
> > 	     *symref
> >              *shallow
> >              flush-pkt
> >
> >     no-refs = PKT-LINE(zero-id SP no-refs LF)
> 
> Can't your list-of-refs have 0 element?  I do not see why you need
> no-refs here.  It's not like you need a dummy line, to the end of
> which you need to append NUL plus hidden capabilities ;-)

Haha! good point.  Yes a list-of-refs can certainly have 0 elements.
One less thing that can be borrowed from the old protocol :)

> 
> >     list-of-refs = *ref
> >     ref = PKT-LINE((tip / peeled) LF)
> >     tip = obj-id SP refname
> >     peeled = obj-id SP refname "^{}"
> >
> >     symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
> >     shallow = PKT-LINE("shallow" SP obj-id LF)
> 
> Thanks.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-24 18:35   ` Brandon Williams
@ 2017-10-25  1:22     ` Junio C Hamano
  2017-10-26  0:59     ` Junio C Hamano
  1 sibling, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2017-10-25  1:22 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, spearce, git, jonathantanmy, jrnieder, peff, sbeller

Brandon Williams <bmwill@google.com> writes:

>> I actually have a reasonable guess why you want to have a separate
>> delimiter (which has nothing to do with "optional delim can be
>> omitted"), but I want to see it explained in this document clearly
>> by its designer(s).
>
> Jonathan Tan suggested that we tighten flush semantics in a newer
> protocol so that proxies are easier to work with.  Currently proxies
> need to understand the protocol instead of simply waiting for a flush.
>
> Also I've been told the smart http code is more complex because of the
> current semantics of flush packets.

I think the above two are the same thing ;-) but yes, "flush" in the
original protocol were used for both "I am truly finished talking;
now it is your turn" and "I am done with one section of what I need
to say, and a different section now begins; it is still my turn to
speak".  The need to handle the latter makes smart-http quite ugly.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-20 17:18 [RFC] protocol version 2 Brandon Williams
  2017-10-24  6:48 ` Junio C Hamano
@ 2017-10-25 13:09 ` Derrick Stolee
  2017-10-25 18:10   ` Brandon Williams
  2017-10-28 22:57 ` Philip Oakley
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Derrick Stolee @ 2017-10-25 13:09 UTC (permalink / raw)
  To: Brandon Williams, git
  Cc: spearce, git, gitster, jonathantanmy, jrnieder, peff, sbeller

On 10/20/2017 1:18 PM, Brandon Williams wrote:
>   Overview
> ==========
>
> This document presents a specification for a version 2 of Git's wire
> protocol.  Protocol v2 will improve upon v1 in the following ways:
>
>    * Instead of multiple service names, multiple commands will be
>      supported by a single service
>    * Easily extendable as capabilities are moved into their own section
>      of the protocol, no longer being hidden behind a NUL byte and
>      limited by the size of a pkt-line (as there will be a single
>      capability per pkt-line).
>    * Separate out other information hidden behind NUL bytes (e.g. agent
>      string as a capability and symrefs can be requested using 'ls-ref')
>    * Ref advertisement will be omitted unless explicitly requested
>    * ls-ref command to explicitly request some refs

Hi Brandon,

I'm very interested in your protocol as a former server-side dev for the 
VSTS Git server, and understand some of these headaches. We built 
limited refs specifically to target the problem you are solving with 
ls-ref, but it requires knowledge about the authenticated user in order 
to work. I believe your suggestion is a better solution for the Git 
protocol.

The "easily extendable" part has specifically caught my interest, as we 
(Microsoft) would like to move most of the GVFS protocol into core Git, 
and this is a great way to do it. Even if not all features are accepted 
by upstream, we could use our GVFS-specific fork of Git to communicate 
to our servers without breaking normal users' interactions.

Please CC me in future versions of this proposal. Let me know if you 
want to chat directly about the "TODO" items below.

Speaking of TODOs, how much of this concept do you have working in a 
prototype? Do you have code that performs this version 2 handshake and 
communicates the ls-refs result?

>   Ls-refs
> ---------
>
> Ls-refs can be looked at as the equivalent of the current ls-remote as
> it is a way to query a remote for the references that it has.  Unlike
> the current ls-remote, the filtering of the output is done on the server
> side by passing a number of parameters to the server-side command
> instead of the filtering occurring on the client.
>
> Ls-ref takes in the following parameters:
>
>    --head, --tags: Limit to only refs/heads or refs/tags

Nit: It would be better to use "--heads" to match refs/heads and your 
use of "--tags" for refs/tags.

>    --refs: Do not show peeled tags or pseudorefs like HEAD

Assuming we are in the case where the server has a HEAD ref, why would 
that ever be advertised? Also, does this imply that without the --refs 
option we would peel annotated tags until we find non-tag OIDs? Neither 
of these functions seem useful as default behavior.

>    --symref: In addition to the object pointed by it, show the underlying
>              ref pointed by it when showing a symbolic ref
>    <refspec>: When specified, only references matching the given patterns
>               are displayed.

Can you be specific about the patterns? For instance, it is not a good 
idea to allow the client to submit a regex for the server to compute. 
Instead, can we limit this pattern-matching to a prefix-set, such as the 
following list of prefixes:

     refs/heads/master
     refs/releases/*
     refs/heads/user/me/*
>   Fetch
> -------
>
> Fetch will need to be a modified version of the v1 fetch protocol.  Some
> potential areas for improvement are: Ref-in-want, CDN offloading,
> Fetch-options.
>
> Since we'll have an 'ls-ref' service we can eliminate the need of fetch
> to perform a ref-advertisement, instead a client can run the 'ls-refs'
> service first, in order to find out what refs the server has, and then
> request those refs directly using the fetch service.
>
> //TODO Flush out the design
>
>   Fetch-object
> --------------
>
> This service could be used by partial clones in order to request missing
> objects.
>
> //TODO Flush out the design

As you flesh our these "fetch" and "fetch-object" commands, keep in mind 
that partial clones could mean any of the following:

  * fetch all reachable objects except for blobs.

  * fetch all reachable objects except for blobs above a
    certain size.

  * fetch all commits, trees, (and blobs?) within a certain
    "cone" of the file system.

>   Push
> ------
>
> Push will need to be a modified version of the v1 push protocol.  Some
> potential areas for improvement are: Fix push-options, Negotiation for
> force push.

Negotiation is something to keep in mind for all pushes, especially in 
an ecosystem full of fork-based workflows. If you are working across 
forks and someone else syncs data between your remotes, you may re-push 
a large chunk of objects that are already present in a fork. Adding an 
ls-refs step before push would be a step in the right direction.
>   Other Considerations
> ======================
>
>    * Move away from pkt-line framing?
>    * Have responses structured in well known formats (e.g. JSON)
>    * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
>    * Additional commands in a partial clone world (e.g. log, grep)

[Tangent]

I too have thought about making calls like "log" and "blame" available 
for calling remotes. One reason GVFS sends a "prefetch pack" of _all_ 
commits and trees is because one "git log -- path/to/file" command would 
start downloading thousands of objects one at a time as the walk moves 
through the history. If the remote can compute the commands that require 
historical data, then our partial clones can be more "pure" (i.e. only 
contain objects required for the user's changes).

One major caveat: if someone runs "log" from HEAD, then they may be 
working over data that is not on the remote, which means they would need 
to start the history walk until reaching commits that are known to be on 
the remote. If there are merges in the local history, then this could 
include multiple independent commits.

Further complicating this area, the server may not want to allow certain 
types of commands (i.e. regexes, expensive history options like 
"--simplify-merges").

In conclusion, I think it is a great idea to have the protocol allow 
these extensions, especially in a way that is easy to extend without 
breaking client/server compat scenarios (after both have v2 enabled).

[End Tangent]

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-25 13:09 ` Derrick Stolee
@ 2017-10-25 18:10   ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-10-25 18:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, spearce, git, gitster, jonathantanmy, jrnieder, peff, sbeller

On 10/25, Derrick Stolee wrote:
> On 10/20/2017 1:18 PM, Brandon Williams wrote:
> >  Overview
> >==========
> >
> >This document presents a specification for a version 2 of Git's wire
> >protocol.  Protocol v2 will improve upon v1 in the following ways:
> >
> >   * Instead of multiple service names, multiple commands will be
> >     supported by a single service
> >   * Easily extendable as capabilities are moved into their own section
> >     of the protocol, no longer being hidden behind a NUL byte and
> >     limited by the size of a pkt-line (as there will be a single
> >     capability per pkt-line).
> >   * Separate out other information hidden behind NUL bytes (e.g. agent
> >     string as a capability and symrefs can be requested using 'ls-ref')
> >   * Ref advertisement will be omitted unless explicitly requested
> >   * ls-ref command to explicitly request some refs
> 
> Hi Brandon,
> 
> I'm very interested in your protocol as a former server-side dev for
> the VSTS Git server, and understand some of these headaches. We

Always happy to hear that someone is excited about the area I'm working
on :)

> built limited refs specifically to target the problem you are
> solving with ls-ref, but it requires knowledge about the
> authenticated user in order to work. I believe your suggestion is a
> better solution for the Git protocol.

Yes one of the big issues we've run into is the ref advertisement, 1
because it makes it difficult to add any additional features to the
protocol and 2 because it doesn't scale well because all refs are
blasted at the client (unless you use a separate system like you
implemented to reduce that number).  So I'm hoping we can solve (1) by
doing some sort of capability advertisement instead of a ref
advertisement upfront and (2) by allowing clients to express a way of
limiting the ref advertisement server side.

> 
> The "easily extendable" part has specifically caught my interest, as
> we (Microsoft) would like to move most of the GVFS protocol into
> core Git, and this is a great way to do it. Even if not all features
> are accepted by upstream, we could use our GVFS-specific fork of Git
> to communicate to our servers without breaking normal users'
> interactions.

This is one thing I'm excited about too and hope there's enough desire
for such a capability to extend the protocol.  I first was interested in
building such a system when i looked at some of the previous work done
by stephan trying to introduce a protocol v2
(https://public-inbox.org/git/1429658342-5295-1-git-send-email-sbeller@google.com/)
and this idea was further reinforced when I sat down and talked with
some mercurial developers about their protocol and what they did to
migrate to a v2.  I also discovered that their v2 protocol is
service/command oriented and it makes it very simple and easy for
extensions to be added server and client side which allow for additional
commands to be executed during a server/client exchange without breaking
the normal fetch/push interaction.

So I'm hoping that what we decided on for v2 will enable exactly what
you want.  It may even allow for doing things like server side log and
grep when a client has a partial clone or a repo which is so big it
would be difficult to do some of those operations locally.  Of course
you identified some of the issues which such operations below :)

> 
> Please CC me in future versions of this proposal. Let me know if you
> want to chat directly about the "TODO" items below.

Of course, I'll make sure to keep you updated.

> 
> Speaking of TODOs, how much of this concept do you have working in a
> prototype? Do you have code that performs this version 2 handshake
> and communicates the ls-refs result?

Most of the TODOs are areas where more thought and design needs to
happen.  My main goal with this RFC is to see if a modular design like
this would have support from the community and if so, to nail down the
design of the exchanges outside of individual commands since that
infrastructure would be harder to change after the fact.

As for the commands themselves, since this design is meant to be module
we could implement them separately or one at a time (though a base set
of commands would need to be implemented and designed before v2 could
roll out).  That being said I've begun working up a rough prototype of
the basic initial capability/command exchange and the ls-refs command.
Since its pretty rough and not integrated into the actual transport code
yet its no where near ready to be sent out though I wouldn't mind
pushing the WIP code out so people can see/play with what's there.

And if you have any comments or suggestions about any parts of the
design or the TODOs I would love to chat about it :)

> 
> >  Ls-refs
> >---------
> >
> >Ls-refs can be looked at as the equivalent of the current ls-remote as
> >it is a way to query a remote for the references that it has.  Unlike
> >the current ls-remote, the filtering of the output is done on the server
> >side by passing a number of parameters to the server-side command
> >instead of the filtering occurring on the client.
> >
> >Ls-ref takes in the following parameters:
> >

For context most of these options i pulled from the existing ls-remote
command and was assuming the existing behavior of the ref advertisement,
so some of these options may definitely need to be tweaked in addition
to what the default would be (without any options).

> >   --head, --tags: Limit to only refs/heads or refs/tags
> 
> Nit: It would be better to use "--heads" to match refs/heads and
> your use of "--tags" for refs/tags.

Thanks for catching this :)

> 
> >   --refs: Do not show peeled tags or pseudorefs like HEAD
> 
> Assuming we are in the case where the server has a HEAD ref, why
> would that ever be advertised? Also, does this imply that without
> the --refs option we would peel annotated tags until we find non-tag
> OIDs? Neither of these functions seem useful as default behavior.

Currently both are true of the existing ref advertisement, all tags are
peeled and HEAD is advertised.  Maybe we don't need this to be the
default though IIRC upon a clone the advertised HEAD is what is checked
out by default once the clone completes.

> 
> >   --symref: In addition to the object pointed by it, show the underlying
> >             ref pointed by it when showing a symbolic ref
> >   <refspec>: When specified, only references matching the given patterns
> >              are displayed.
> 
> Can you be specific about the patterns? For instance, it is not a
> good idea to allow the client to submit a regex for the server to
> compute. Instead, can we limit this pattern-matching to a
> prefix-set, such as the following list of prefixes:
> 
>     refs/heads/master
>     refs/releases/*
>     refs/heads/user/me/*

You're right we may need to think carefully about what kind of limiting
we allow to prevent malicious clients.

> >  Fetch
> >-------
> >
> >Fetch will need to be a modified version of the v1 fetch protocol.  Some
> >potential areas for improvement are: Ref-in-want, CDN offloading,
> >Fetch-options.
> >
> >Since we'll have an 'ls-ref' service we can eliminate the need of fetch
> >to perform a ref-advertisement, instead a client can run the 'ls-refs'
> >service first, in order to find out what refs the server has, and then
> >request those refs directly using the fetch service.
> >
> >//TODO Flush out the design
> >
> >  Fetch-object
> >--------------
> >
> >This service could be used by partial clones in order to request missing
> >objects.
> >
> >//TODO Flush out the design
> 
> As you flesh our these "fetch" and "fetch-object" commands, keep in
> mind that partial clones could mean any of the following:
> 
>  * fetch all reachable objects except for blobs.
> 
>  * fetch all reachable objects except for blobs above a
>    certain size.
> 
>  * fetch all commits, trees, (and blobs?) within a certain
>    "cone" of the file system.
> 
> >  Push
> >------
> >
> >Push will need to be a modified version of the v1 push protocol.  Some
> >potential areas for improvement are: Fix push-options, Negotiation for
> >force push.
> 
> Negotiation is something to keep in mind for all pushes, especially
> in an ecosystem full of fork-based workflows. If you are working
> across forks and someone else syncs data between your remotes, you
> may re-push a large chunk of objects that are already present in a
> fork. Adding an ls-refs step before push would be a step in the
> right direction.
> >  Other Considerations
> >======================
> >
> >   * Move away from pkt-line framing?
> >   * Have responses structured in well known formats (e.g. JSON)
> >   * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
> >   * Additional commands in a partial clone world (e.g. log, grep)
> 
> [Tangent]
> 
> I too have thought about making calls like "log" and "blame"
> available for calling remotes. One reason GVFS sends a "prefetch
> pack" of _all_ commits and trees is because one "git log --
> path/to/file" command would start downloading thousands of objects
> one at a time as the walk moves through the history. If the remote
> can compute the commands that require historical data, then our
> partial clones can be more "pure" (i.e. only contain objects
> required for the user's changes).
> 
> One major caveat: if someone runs "log" from HEAD, then they may be
> working over data that is not on the remote, which means they would
> need to start the history walk until reaching commits that are known
> to be on the remote. If there are merges in the local history, then
> this could include multiple independent commits.

These are some of the same points that some of my fellow coworkers have
also pointed out and definitely needs to be kept in mind if we ever
implement such a command.

> 
> Further complicating this area, the server may not want to allow
> certain types of commands (i.e. regexes, expensive history options
> like "--simplify-merges").
> 
> In conclusion, I think it is a great idea to have the protocol allow
> these extensions, especially in a way that is easy to extend without
> breaking client/server compat scenarios (after both have v2
> enabled).
> 
> [End Tangent]
> 
> Thanks,
> -Stolee
> 

Thanks for taking a look and giving your feedback!

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-24 18:35   ` Brandon Williams
  2017-10-25  1:22     ` Junio C Hamano
@ 2017-10-26  0:59     ` Junio C Hamano
  1 sibling, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2017-10-26  0:59 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, spearce, git, jonathantanmy, jrnieder, peff, sbeller

Brandon Williams <bmwill@google.com> writes:

> On 10/24, Junio C Hamano wrote:
>> Brandon Williams <bmwill@google.com> writes:
>> 
>> >   <refspec>: When specified, only references matching the given patterns
>> >              are displayed.
>> 
>> I do not think you meant <refspec> here.
>> 
>> The side that is listing what it has has no reason to know what the
>> recipient plans to do with the result, so you must be only sending
>> the LHS of a refspec.  If your explanation says "given patterns",
>> then replace <refspec> with <pattern>.  Do not abuse a term that has
>> specific and established meaning for something else.
>
> Yes, you're right i intended that to mean <pattern> instead so that the
> client could send "refs/heads/*" or some other such pattern and have the
> server limit its output.

Speaking of limiting the bandwidth consumed by the ref
advertisement, I think another trick that came up in past
discussions may be worth considering, which is to allow the
requestor to say, "oh by the way, I made exactly the same request as
this one earlier to you, and your response hashed down to this
value".  The responder may choose to give an incremental response
relative to the known result the requestor claims to have.

So for example, a requestor may have made an earlier ls-refs request
and can still recall that it got:

	refs/heads/maint	ObjectID A
	refs/heads/master	ObjectID B
	refs/tags/v1.0		ObjectId C

Also assume that these three lines together (textually) hashes to Z.

When the requestor asks about the two hierarchies, it may say "I know
you gave a result that hashes to Z" with an additional parameter:

	command=ls-ref
		refspec=refs/heads/*
		refspec=refs/tags/*
		known=Z

If the current response for refs/heads/* and refs/tags/* when fully
spelt out were like this (i.e. we updated a ref, gained another, and
lost one):

	refs/heads/master	ObjectID D
	refs/tags/v1.0		ObjectId C
	refs/tags/v1.1		ObjectID E

then the responder can send the fully spelt version, or it can
choose to say, "It's good that you know the state Z; relative to
that, refs/heads/maint no loner exists, refs/heads/master is now at
D and refs/tags/v1.1 at E has been added", if the latter results in
a shorter response (and if it recognises Z and map it back to the
set of refs and their values it responded with).

The "known" request parameter could further be refined (I do not
think this possibility was discussed in the past) to say "among the
values I received earlier from you, the ones that match this pattern
hashes to this", e.g. the earlier example request might become

	command=ls-ref
		refspec=refs/heads/*
		refspec=refs/tags/*
		known=X for refs/heads/*
		known=Y for refs/tags/*


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-20 17:18 [RFC] protocol version 2 Brandon Williams
  2017-10-24  6:48 ` Junio C Hamano
  2017-10-25 13:09 ` Derrick Stolee
@ 2017-10-28 22:57 ` Philip Oakley
  2017-10-31 18:42   ` Brandon Williams
  2017-11-10 20:13 ` Jonathan Tan
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
  4 siblings, 1 reply; 47+ messages in thread
From: Philip Oakley @ 2017-10-28 22:57 UTC (permalink / raw)
  To: Brandon Williams, git
  Cc: Brandon Williams, spearce, git, gitster, jonathantanmy, jrnieder,
	peff, sbeller

From: "Brandon Williams" <bmwill@google.com>
Sent: Friday, October 20, 2017 6:18 PM
> Objective
> ===========
>
> Replace Git's current wire protocol with a simpler, less wasteful
> protocol that can evolve over time.
>

<snip>

> Capability Advertisement
> --------------------------
>
> A server which decides to communicate (based on a request from a client)
> using protocol version 2, notifies the client by sending a version
> string in its initial response followed by an advertisement of its
> capabilities.  Each capability is a key with an optional value.  Clients
> must ignore all unknown keys.

>    Semantics of unknown values are left to
> the definition of each key.

This sounds odd. If the keys are known then their semantics are known. Or 
the keys are unknown and they and their values are ignored.

Maybe: Capability keys shall define their response to unknown key values.

>  Some capabilities will describe commands
> which can be requested to be executed by the client.
>
<snip>

> Ls-refs
> ---------
>
> Ls-refs can be looked at as the equivalent of the current ls-remote as
> it is a way to query a remote for the references that it has.  Unlike
> the current ls-remote, the filtering of the output is done on the server
> side by passing a number of parameters to the server-side command
> instead of the filtering occurring on the client.
>
> Ls-ref takes in the following parameters:
>
>  --head, --tags: Limit to only refs/heads or refs/tags
>  --refs: Do not show peeled tags or pseudorefs like HEAD
>  --symref: In addition to the object pointed by it, show the underlying
>            ref pointed by it when showing a symbolic ref
>  <refspec>: When specified, only references matching the given patterns
>             are displayed.

Does the --symref also the pseudorefs?

Isn't there a need somethimes to determine the ref that the remote's HEAD 
points to. This is an issue with the current clone and bundle code when 
there is a choice of refs/branches that could be the current HEAD ref and 
the wrong one is chosen, though this V2 change doesn't affect bundles.

>
> The output of ls-refs is as follows:
>
>    output = (no-refs / list-of-refs)
>      *symref
>             *shallow
>             flush-pkt
>
>    no-refs = PKT-LINE(zero-id SP no-refs LF)
>    list-of-refs = *ref
>    ref = PKT-LINE((tip / peeled) LF)
>    tip = obj-id SP refname
>    peeled = obj-id SP refname "^{}"
>
>    symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
>    shallow = PKT-LINE("shallow" SP obj-id LF)
>
> Fetch
> -------
>
> Fetch will need to be a modified version of the v1 fetch protocol.  Some
> potential areas for improvement are: Ref-in-want, CDN offloading,
> Fetch-options.
>
> Since we'll have an 'ls-ref' service we can eliminate the need of fetch
> to perform a ref-advertisement, instead a client can run the 'ls-refs'
> service first, in order to find out what refs the server has, and then
> request those refs directly using the fetch service.
>
> //TODO Flush out the design
>
> Fetch-object
> --------------
>
> This service could be used by partial clones in order to request missing
> objects.
>
> //TODO Flush out the design
>
> Push
> ------
>
> Push will need to be a modified version of the v1 push protocol.  Some
> potential areas for improvement are: Fix push-options, Negotiation for
> force push.
>
> One change that will need to happen is to improve how `push-options` are
> sent to the server (so that they aren't sent twice!!).  Also the
> report-status needs to be better than it currently is in v1 so that
> tools like gerrit can explain what it did with the ref-update the client
> sent to it. Maybe have a push-rebase capability or command?
>
> //TODO Flush out the design
>
> Other Considerations
> ======================
>
>  * Move away from pkt-line framing?
>  * Have responses structured in well known formats (e.g. JSON)
>  * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
>  * Additional commands in a partial clone world (e.g. log, grep)
--
Philip 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-28 22:57 ` Philip Oakley
@ 2017-10-31 18:42   ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-10-31 18:42 UTC (permalink / raw)
  To: Philip Oakley
  Cc: git, spearce, git, gitster, jonathantanmy, jrnieder, peff, sbeller

On 10/28, Philip Oakley wrote:
> From: "Brandon Williams" <bmwill@google.com>
> Sent: Friday, October 20, 2017 6:18 PM
> >Objective
> >===========
> >
> >Replace Git's current wire protocol with a simpler, less wasteful
> >protocol that can evolve over time.
> >
> 
> <snip>
> 
> >Capability Advertisement
> >--------------------------
> >
> >A server which decides to communicate (based on a request from a client)
> >using protocol version 2, notifies the client by sending a version
> >string in its initial response followed by an advertisement of its
> >capabilities.  Each capability is a key with an optional value.  Clients
> >must ignore all unknown keys.
> 
> >   Semantics of unknown values are left to
> >the definition of each key.
> 
> This sounds odd. If the keys are known then their semantics are
> known. Or the keys are unknown and they and their values are
> ignored.
> 
> Maybe: Capability keys shall define their response to unknown key values.

I'll work to make the language a little clearer.

> 
> > Some capabilities will describe commands
> >which can be requested to be executed by the client.
> >
> <snip>
> 
> >Ls-refs
> >---------
> >
> >Ls-refs can be looked at as the equivalent of the current ls-remote as
> >it is a way to query a remote for the references that it has.  Unlike
> >the current ls-remote, the filtering of the output is done on the server
> >side by passing a number of parameters to the server-side command
> >instead of the filtering occurring on the client.
> >
> >Ls-ref takes in the following parameters:
> >
> > --head, --tags: Limit to only refs/heads or refs/tags
> > --refs: Do not show peeled tags or pseudorefs like HEAD
> > --symref: In addition to the object pointed by it, show the underlying
> >           ref pointed by it when showing a symbolic ref
> > <refspec>: When specified, only references matching the given patterns
> >            are displayed.
> 
> Does the --symref also the pseudorefs?
> 
> Isn't there a need somethimes to determine the ref that the remote's
> HEAD points to. This is an issue with the current clone and bundle
> code when there is a choice of refs/branches that could be the
> current HEAD ref and the wrong one is chosen, though this V2 change
> doesn't affect bundles.

Yeah, currently the resolution of HEAD is stuffed into the
capability line in v1.  The intention of this would be to allow for the
resolution of all symrefs (including HEAD).

> 
> >
> >The output of ls-refs is as follows:
> >
> >   output = (no-refs / list-of-refs)
> >     *symref
> >            *shallow
> >            flush-pkt
> >
> >   no-refs = PKT-LINE(zero-id SP no-refs LF)
> >   list-of-refs = *ref
> >   ref = PKT-LINE((tip / peeled) LF)
> >   tip = obj-id SP refname
> >   peeled = obj-id SP refname "^{}"
> >
> >   symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
> >   shallow = PKT-LINE("shallow" SP obj-id LF)
> >
> >Fetch
> >-------
> >
> >Fetch will need to be a modified version of the v1 fetch protocol.  Some
> >potential areas for improvement are: Ref-in-want, CDN offloading,
> >Fetch-options.
> >
> >Since we'll have an 'ls-ref' service we can eliminate the need of fetch
> >to perform a ref-advertisement, instead a client can run the 'ls-refs'
> >service first, in order to find out what refs the server has, and then
> >request those refs directly using the fetch service.
> >
> >//TODO Flush out the design
> >
> >Fetch-object
> >--------------
> >
> >This service could be used by partial clones in order to request missing
> >objects.
> >
> >//TODO Flush out the design
> >
> >Push
> >------
> >
> >Push will need to be a modified version of the v1 push protocol.  Some
> >potential areas for improvement are: Fix push-options, Negotiation for
> >force push.
> >
> >One change that will need to happen is to improve how `push-options` are
> >sent to the server (so that they aren't sent twice!!).  Also the
> >report-status needs to be better than it currently is in v1 so that
> >tools like gerrit can explain what it did with the ref-update the client
> >sent to it. Maybe have a push-rebase capability or command?
> >
> >//TODO Flush out the design
> >
> >Other Considerations
> >======================
> >
> > * Move away from pkt-line framing?
> > * Have responses structured in well known formats (e.g. JSON)
> > * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
> > * Additional commands in a partial clone world (e.g. log, grep)
> --
> Philip
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] protocol version 2
  2017-10-20 17:18 [RFC] protocol version 2 Brandon Williams
                   ` (2 preceding siblings ...)
  2017-10-28 22:57 ` Philip Oakley
@ 2017-11-10 20:13 ` Jonathan Tan
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
  4 siblings, 0 replies; 47+ messages in thread
From: Jonathan Tan @ 2017-11-10 20:13 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, spearce, git, gitster, jrnieder, peff, sbeller

On Fri, 20 Oct 2017 10:18:39 -0700
Brandon Williams <bmwill@google.com> wrote:

> Some of the pain points with the current protocol spec are:

After some in-office discussion, I think that the most important pain
point is that we have to implement each protocol twice: once for
HTTP(S), and once for SSH (and friends) that support bidirectional byte
streams.

If it weren't for this, I think that what is discussed in this document
(e.g. ls-refs, fetch-object) can be less invasively accomplished with
v1, specifying "extra parameters" (explained in this e-mail [1]) to
merely tweak the output of upload-pack instead of replacing it nearly
completely, thus acting more as optimizations than changing the mode of
operation entirely.

[1] https://public-inbox.org/git/20171010193956.168385-1-jonathantanmy@google.com/

>   * The server's initial response is the ref advertisement.  This
>     advertisement cannot be omitted and can become an issue due to the
>     sheer number of refs that can be sent with large repositories.  For
>     example, when contacting the internal equivalent of
>     `https://android.googlesource.com/`, the server will send
>     approximately 1 million refs totaling 71MB.  This is data that is
>     sent during each and every fetch and is not scalable.

For me, this is not a compelling one, because we can provide a ref
whitelist as an "extra parameter" in v1.

>   * Capabilities were implemented as a hack and are hidden behind a NUL
>     byte after the first ref sent from the server during the ref
>     advertisement:
> 
> 	<SHA1> <Ref Name>\0<capabilities space separated> <symref> <agent>
> 
>     Since they are sent in the context of a pkt-line they are also subject
>     to the same length limitations (1k bytes with old clients).  While we
>     may not be close to hitting this limitation with capabilities alone, it
>     has become a problem when trying to abuse capabilities for other
>     purposes (e.g. [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)).
> 
>   * Various other technical debt (e.g. abusing capabilities to
>     communicate agent and symref data, service name set using a query
>     parameter).

I think these 2 are the same - I would emphasize the fact that we cannot
add more stuff here, rather than the fact that we're putting this behind
NUL.

>  Special Packets
> -----------------
> 
> In protocol v2 these special packets will have the following semantics:
> 
>   * '0000' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

To address the pain point of HTTP(S) being different from the others
(mentioned above), I think the packet semantics should be further
qualified:

 - Communications must be divided up into packets terminated by a
   flush-pkt. Also, each side must be implemented without knowing
   whether packets-in-progress can or cannot be seen by the other side.
 - Each request packet must have a corresponding, possibly empty,
   response packet.
 - A request packet may be sent even if a response packet corresponding
   to a previously sent request packet is awaited. (This allows us to
   retain the existing optimization in fetch-pack wherein, during
   negotiation, the "have" request-response packet pairs are
   interleaved.)

This will allow us to more easily share code between HTTP(S) and the
others.

In summary, I think that we need a big motivation to make the jump from
v1 to v2, instead of merely making small changes to v1 (and I do think
that the proposed new commands, such as "ls-refs" and "fetch-object",
can be implemented merely by small changes). And I think that the
ability to better share code between HTTP(S) and others provides that
motivation.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [WIP 00/15] protocol version 2
  2017-10-20 17:18 [RFC] protocol version 2 Brandon Williams
                   ` (3 preceding siblings ...)
  2017-11-10 20:13 ` Jonathan Tan
@ 2017-12-04 23:58 ` Brandon Williams
  2017-12-04 23:58   ` [WIP 01/15] pkt-line: introduce packet_read_with_status Brandon Williams
                     ` (14 more replies)
  4 siblings, 15 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

A while back I sent out a rough outline for what a protocol version 2 would
look like at
https://public-inbox.org/git/20171020171839.4188-1-bmwill@google.com/.  After
hacking at both the server and client code I've managed to get some patches for
protocol version 2 which implement listing refs and cloning/fetching working.
I still need to get the push side of things working before this would be viable
to accept and apply but I figured this was in a good enough state to start
getting some early feedback on the concept.  I'm hoping that some comments and
other pairs of eyes can help me identify deficiencies earlier rather than
later.

As a whole, fetch works fairly similar to the way it works in v1 except I
removed the support for shallow clients for the time being in order to make it
easier to implement at first.  I haven't decided if it makes more sense to have
the deepening as a separate server command or to keep it in fetch like it is in
v1, just something to think about.

Any comments or criticism is welcome. Thanks!

Brandon Williams (15):
  pkt-line: introduce packet_read_with_status
  pkt-line: introduce struct packet_reader
  pkt-line: add delim packet support
  upload-pack: convert to a builtin
  upload-pack: factor out processing lines
  transport: use get_refs_via_connect to get refs
  connect: convert get_remote_heads to use struct packet_reader
  connect: discover protocol version outside of get_remote_heads
  transport: store protocol version
  protocol: introduce enum protocol_version value protocol_v2
  serve: introduce git-serve
  ls-refs: introduce ls-refs server command
  connect: request remote refs using v2
  upload_pack: introduce fetch server command
  fetch-pack: perform a fetch using v2

 .gitignore             |   1 +
 Makefile               |   6 +-
 builtin.h              |   2 +
 builtin/fetch-pack.c   |  19 ++-
 builtin/receive-pack.c |   3 +
 builtin/send-pack.c    |  18 ++-
 builtin/serve.c        |  25 ++++
 connect.c              | 210 +++++++++++++++++++++---------
 connect.h              |   3 +
 fetch-pack.c           | 237 +++++++++++++++++++++++++++++++++-
 fetch-pack.h           |   4 +-
 git.c                  |   2 +
 ls-refs.c              |  96 ++++++++++++++
 ls-refs.h              |   9 ++
 pkt-line.c             | 133 +++++++++++++++++--
 pkt-line.h             |  31 +++++
 protocol.c             |   2 +
 protocol.h             |   1 +
 remote-curl.c          |  21 ++-
 remote.h               |   7 +-
 serve.c                | 195 ++++++++++++++++++++++++++++
 serve.h                |   6 +
 t/t5701-protocol-v2.sh |  54 ++++++++
 transport.c            |  84 ++++++++----
 upload-pack.c          | 342 +++++++++++++++++++++++++++++++++++++++++++------
 upload-pack.h          |   9 ++
 26 files changed, 1371 insertions(+), 149 deletions(-)
 create mode 100644 builtin/serve.c
 create mode 100644 ls-refs.c
 create mode 100644 ls-refs.h
 create mode 100644 serve.c
 create mode 100644 serve.h
 create mode 100755 t/t5701-protocol-v2.sh
 create mode 100644 upload-pack.h

-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [WIP 01/15] pkt-line: introduce packet_read_with_status
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-07 20:53     ` Stefan Beller
  2017-12-04 23:58   ` [WIP 02/15] pkt-line: introduce struct packet_reader Brandon Williams
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

The current pkt-line API encodes the status of a pkt-line read in the
length of the read content.  An error is indicated with '-1', a flush
with '0' (which can be confusing since a return value of '0' can also
indicate an empty pkt-line), and a positive integer for the length of
the read content otherwise.  This doesn't leave much room for allowing
the addition of additional special packets in the future.

To solve this introduce 'packet_read_with_status()' which reads a packet
and returns the status of the read encoded as an 'enum packet_status'
type.  This allows for easily identifying between special and normal
packets as well as errors.  It also enables easily adding a new special
packet in the future.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 pkt-line.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------------
 pkt-line.h |  8 ++++++++
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 7006b3587..ac619f05b 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -280,28 +280,33 @@ static int packet_length(const char *linelen)
 	return (val < 0) ? val : (val << 8) | hex2chr(linelen + 2);
 }
 
-int packet_read(int fd, char **src_buf, size_t *src_len,
-		char *buffer, unsigned size, int options)
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len,
+						char *buffer, unsigned size, int *pktlen,
+						int options)
 {
-	int len, ret;
+	int len;
 	char linelen[4];
 
-	ret = get_packet_data(fd, src_buf, src_len, linelen, 4, options);
-	if (ret < 0)
-		return ret;
+	if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0)
+		return PACKET_READ_ERROR;
+
 	len = packet_length(linelen);
 	if (len < 0)
 		die("protocol error: bad line length character: %.4s", linelen);
-	if (!len) {
+
+	if (len == 0) {
 		packet_trace("0000", 4, 0);
-		return 0;
+		return PACKET_READ_FLUSH;
+	} else if (len >= 1 && len <= 3) {
+		die("protocol error: bad line length character: %.4s", linelen);
 	}
+
 	len -= 4;
-	if (len >= size)
+	if ((len < 0) || ((unsigned)len >= size))
 		die("protocol error: bad line length %d", len);
-	ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
-	if (ret < 0)
-		return ret;
+
+	if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0)
+		return PACKET_READ_ERROR;
 
 	if ((options & PACKET_READ_CHOMP_NEWLINE) &&
 	    len && buffer[len-1] == '\n')
@@ -309,7 +314,31 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
 
 	buffer[len] = 0;
 	packet_trace(buffer, len, 0);
-	return len;
+	*pktlen = len;
+	return PACKET_READ_NORMAL;
+}
+
+int packet_read(int fd, char **src_buffer, size_t *src_len,
+		char *buffer, unsigned size, int options)
+{
+	enum packet_read_status status;
+	int pktlen;
+
+	status = packet_read_with_status(fd, src_buffer, src_len,
+					 buffer, size, &pktlen,
+					 options);
+	switch (status) {
+	case PACKET_READ_ERROR:
+		pktlen = -1;
+		break;
+	case PACKET_READ_NORMAL:
+		break;
+	case PACKET_READ_FLUSH:
+		pktlen = 0;
+		break;
+	}
+
+	return pktlen;
 }
 
 static char *packet_read_line_generic(int fd,
diff --git a/pkt-line.h b/pkt-line.h
index 3dad583e2..f1545929b 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -60,8 +60,16 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if
  * present) is removed from the buffer before returning.
  */
+enum packet_read_status {
+	PACKET_READ_ERROR = -1,
+	PACKET_READ_NORMAL,
+	PACKET_READ_FLUSH,
+};
 #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
 #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len,
+						char *buffer, unsigned size, int *pktlen,
+						int options);
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
 		*buffer, unsigned size, int options);
 
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 02/15] pkt-line: introduce struct packet_reader
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
  2017-12-04 23:58   ` [WIP 01/15] pkt-line: introduce packet_read_with_status Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-07 22:01     ` Stefan Beller
  2017-12-04 23:58   ` [WIP 03/15] pkt-line: add delim packet support Brandon Williams
                     ` (12 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Sometimes it is advantageous to be able to peek the next packet line
without consuming it (e.g. to be able to determine the protocol version
a server is speaking).  In order to do that introduce 'struct
packet_reader' which is an abstraction around the normal packet reading
logic.  This enables a caller to be able to peek a single line at a time
using 'packet_reader_peek()' and having a caller consume a line by
calling 'packet_reader_read()'.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 pkt-line.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pkt-line.h | 20 ++++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index ac619f05b..518109bbe 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -406,3 +406,64 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out)
 	}
 	return sb_out->len - orig_len;
 }
+
+/* Packet Reader Functions */
+void packet_reader_init(struct packet_reader *reader, int fd,
+			char *src_buffer, size_t src_len)
+{
+	reader->fd = fd;
+	reader->src_buffer = src_buffer;
+	reader->src_len = src_len;
+	reader->buffer = packet_buffer;
+	reader->buffer_size = sizeof(packet_buffer);
+	reader->options = PACKET_READ_CHOMP_NEWLINE | PACKET_READ_GENTLE_ON_EOF;
+
+	reader->line = NULL;
+	reader->line_peeked = 0;
+	reader->pktlen = 0;
+	reader->status = PACKET_READ_ERROR;
+}
+
+enum packet_read_status packet_reader_read(struct packet_reader *reader)
+{
+	if (reader->line_peeked) {
+		reader->line_peeked = 0;
+		return reader->status;
+	}
+
+	reader->status = packet_read_with_status(reader->fd,
+						 &reader->src_buffer,
+						 &reader->src_len,
+						 reader->buffer,
+						 reader->buffer_size,
+						 &reader->pktlen,
+						 reader->options);
+
+	switch (reader->status) {
+	case PACKET_READ_ERROR:
+		reader->pktlen = -1;
+		reader->line = NULL;
+		break;
+	case PACKET_READ_NORMAL:
+		reader->line = reader->buffer;
+		break;
+	case PACKET_READ_FLUSH:
+		reader->pktlen = 0;
+		reader->line = NULL;
+		break;
+	}
+
+	return reader->status;
+}
+
+enum packet_read_status packet_reader_peek(struct packet_reader *reader)
+{
+	/* Only allow peeking a single line */
+	if (reader->line_peeked)
+		return reader->status;
+
+	/* Peek a line by reading it and setting peeked flag */
+	packet_reader_read(reader);
+	reader->line_peeked = 1;
+	return reader->status;
+}
diff --git a/pkt-line.h b/pkt-line.h
index f1545929b..2b5c7cf11 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -104,6 +104,26 @@ char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
  */
 ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
 
+struct packet_reader {
+	int fd;
+	char *src_buffer;
+	size_t src_len;
+
+	char *buffer;
+	unsigned buffer_size;
+	int options;
+
+	enum packet_read_status status;
+	int pktlen;
+	const char *line;
+	int line_peeked;
+};
+
+extern void packet_reader_init(struct packet_reader *reader, int fd,
+			       char *src_buffer, size_t src_len);
+extern enum packet_read_status packet_reader_read(struct packet_reader *reader);
+extern enum packet_read_status packet_reader_peek(struct packet_reader *reader);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 #define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 03/15] pkt-line: add delim packet support
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
  2017-12-04 23:58   ` [WIP 01/15] pkt-line: introduce packet_read_with_status Brandon Williams
  2017-12-04 23:58   ` [WIP 02/15] pkt-line: introduce struct packet_reader Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-07 22:30     ` Stefan Beller
  2017-12-04 23:58   ` [WIP 04/15] upload-pack: convert to a builtin Brandon Williams
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

One of the design goals of protocol-v2 is to improve the semantics of
flush packets.  Currently in protocol-v1, flush packets are used both to
indicate a break in a list of packet lines as well as an indication that
one side has finished speaking.  This makes it particularly difficult
to implement proxies as a proxy would need to completely understand git
protocol instead of simply looking for a flush packet.

To do this, introduce the special deliminator packet '0001'.  A delim
packet can then be used as a deliminator between lists of packet lines
while flush packets can be reserved to indicate the end of a response.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 pkt-line.c | 19 ++++++++++++++++++-
 pkt-line.h |  3 +++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/pkt-line.c b/pkt-line.c
index 518109bbe..222e1e310 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,12 @@ void packet_flush(int fd)
 	write_or_die(fd, "0000", 4);
 }
 
+void packet_delim(int fd)
+{
+	packet_trace("0001", 4, 1);
+	write_or_die(fd, "0001", 4);
+}
+
 int packet_flush_gently(int fd)
 {
 	packet_trace("0000", 4, 1);
@@ -105,6 +111,12 @@ void packet_buf_flush(struct strbuf *buf)
 	strbuf_add(buf, "0000", 4);
 }
 
+void packet_buf_delim(struct strbuf *buf)
+{
+	packet_trace("0001", 4, 1);
+	strbuf_add(buf, "0001", 4);
+}
+
 static void set_packet_header(char *buf, const int size)
 {
 	static char hexchar[] = "0123456789abcdef";
@@ -297,7 +309,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_
 	if (len == 0) {
 		packet_trace("0000", 4, 0);
 		return PACKET_READ_FLUSH;
-	} else if (len >= 1 && len <= 3) {
+	} else if (len == 1) {
+		packet_trace("0001", 4, 0);
+		return PACKET_READ_DELIM;
+	} else if (len >= 2 && len <= 3) {
 		die("protocol error: bad line length character: %.4s", linelen);
 	}
 
@@ -333,6 +348,7 @@ int packet_read(int fd, char **src_buffer, size_t *src_len,
 		break;
 	case PACKET_READ_NORMAL:
 		break;
+	case PACKET_READ_DELIM:
 	case PACKET_READ_FLUSH:
 		pktlen = 0;
 		break;
@@ -447,6 +463,7 @@ enum packet_read_status packet_reader_read(struct packet_reader *reader)
 	case PACKET_READ_NORMAL:
 		reader->line = reader->buffer;
 		break;
+	case PACKET_READ_DELIM:
 	case PACKET_READ_FLUSH:
 		reader->pktlen = 0;
 		reader->line = NULL;
diff --git a/pkt-line.h b/pkt-line.h
index 2b5c7cf11..49ec80c80 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,8 +20,10 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
+void packet_delim(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
+void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
@@ -64,6 +66,7 @@ enum packet_read_status {
 	PACKET_READ_ERROR = -1,
 	PACKET_READ_NORMAL,
 	PACKET_READ_FLUSH,
+	PACKET_READ_DELIM,
 };
 #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
 #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 04/15] upload-pack: convert to a builtin
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (2 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 03/15] pkt-line: add delim packet support Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-06 21:59     ` Junio C Hamano
  2017-12-04 23:58   ` [WIP 05/15] upload-pack: factor out processing lines Brandon Williams
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

In order to allow for code sharing with the server-side of fetch in
protocol-v2 convert upload-pack to be a builtin.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Makefile      | 3 ++-
 builtin.h     | 1 +
 git.c         | 1 +
 upload-pack.c | 2 +-
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 9ce68cded..86394b69d 100644
--- a/Makefile
+++ b/Makefile
@@ -630,7 +630,6 @@ PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
 PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
-PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
 
 # Binary suffix, set to .exe for Windows builds
@@ -692,6 +691,7 @@ BUILT_INS += git-merge-subtree$X
 BUILT_INS += git-show$X
 BUILT_INS += git-stage$X
 BUILT_INS += git-status$X
+BUILT_INS += git-upload-pack$X
 BUILT_INS += git-whatchanged$X
 
 # what 'all' will build and 'install' will install in gitexecdir,
@@ -890,6 +890,7 @@ LIB_OBJS += tree-diff.o
 LIB_OBJS += tree.o
 LIB_OBJS += tree-walk.o
 LIB_OBJS += unpack-trees.o
+LIB_OBJS += upload-pack.o
 LIB_OBJS += url.o
 LIB_OBJS += urlmatch.o
 LIB_OBJS += usage.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa..f332a1257 100644
--- a/builtin.h
+++ b/builtin.h
@@ -231,6 +231,7 @@ extern int cmd_update_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_update_server_info(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_archive(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix);
+extern int cmd_upload_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_var(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_tag(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index f31dca696..e32e16f2d 100644
--- a/git.c
+++ b/git.c
@@ -474,6 +474,7 @@ static struct cmd_struct commands[] = {
 	{ "update-server-info", cmd_update_server_info, RUN_SETUP },
 	{ "upload-archive", cmd_upload_archive },
 	{ "upload-archive--writer", cmd_upload_archive_writer },
+	{ "upload-pack", cmd_upload_pack },
 	{ "var", cmd_var, RUN_SETUP_GENTLY },
 	{ "verify-commit", cmd_verify_commit, RUN_SETUP },
 	{ "verify-pack", cmd_verify_pack },
diff --git a/upload-pack.c b/upload-pack.c
index ef99a029c..2d16952a3 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1033,7 +1033,7 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
 	return parse_hide_refs_config(var, value, "uploadpack");
 }
 
-int cmd_main(int argc, const char **argv)
+int cmd_upload_pack(int argc, const char **argv, const char *prefix)
 {
 	const char *dir;
 	int strict = 0;
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 05/15] upload-pack: factor out processing lines
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (3 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 04/15] upload-pack: convert to a builtin Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-04 23:58   ` [WIP 06/15] transport: use get_refs_via_connect to get refs Brandon Williams
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Factor out the logic for processing shallow, deepen, deepen_since, and
deepen_not lines into their own functions to simplify the
'receive_needs()' function in addition to making it easier to reuse some
of this logic when implementing protocol_v2.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 upload-pack.c | 113 ++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 74 insertions(+), 39 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 2d16952a3..d2711e4ee 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -731,6 +731,75 @@ static void deepen_by_rev_list(int ac, const char **av,
 	packet_flush(1);
 }
 
+static int process_shallow(const char *line, struct object_array *shallows)
+{
+	const char *arg;
+	if (skip_prefix(line, "shallow ", &arg)) {
+		struct object_id oid;
+		struct object *object;
+		if (get_oid_hex(arg, &oid))
+			die("invalid shallow line: %s", line);
+		object = parse_object(&oid);
+		if (!object)
+			return 1;
+		if (object->type != OBJ_COMMIT)
+			die("invalid shallow object %s", oid_to_hex(&oid));
+		if (!(object->flags & CLIENT_SHALLOW)) {
+			object->flags |= CLIENT_SHALLOW;
+			add_object_array(object, NULL, shallows);
+		}
+		return 1;
+	}
+
+	return 0;
+}
+
+static int process_deepen(const char *line, int *depth)
+{
+	const char *arg;
+	if (skip_prefix(line, "deepen ", &arg)) {
+		char *end = NULL;
+		*depth = strtol(arg, &end, 0);
+		if (!end || *end || depth <= 0)
+			die("Invalid deepen: %s", line);
+		return 1;
+	}
+
+	return 0;
+}
+
+static int process_deepen_since(const char *line, timestamp_t *deepen_since, int *deepen_rev_list)
+{
+	const char *arg;
+	if (skip_prefix(line, "deepen-since ", &arg)) {
+		char *end = NULL;
+		*deepen_since = parse_timestamp(arg, &end, 0);
+		if (!end || *end || !deepen_since ||
+		    /* revisions.c's max_age -1 is special */
+		    *deepen_since == -1)
+			die("Invalid deepen-since: %s", line);
+		*deepen_rev_list = 1;
+		return 1;
+	}
+	return 0;
+}
+
+static int process_deepen_not(const char *line, struct string_list *deepen_not, int *deepen_rev_list)
+{
+	const char *arg;
+	if (skip_prefix(line, "deepen-not ", &arg)) {
+		char *ref = NULL;
+		struct object_id oid;
+		if (expand_ref(arg, strlen(arg), oid.hash, &ref) != 1)
+			die("git upload-pack: ambiguous deepen-not: %s", line);
+		string_list_append(deepen_not, ref);
+		free(ref);
+		*deepen_rev_list = 1;
+		return 1;
+	}
+	return 0;
+}
+
 static void receive_needs(void)
 {
 	struct object_array shallows = OBJECT_ARRAY_INIT;
@@ -752,49 +821,15 @@ static void receive_needs(void)
 		if (!line)
 			break;
 
-		if (skip_prefix(line, "shallow ", &arg)) {
-			struct object_id oid;
-			struct object *object;
-			if (get_oid_hex(arg, &oid))
-				die("invalid shallow line: %s", line);
-			object = parse_object(&oid);
-			if (!object)
-				continue;
-			if (object->type != OBJ_COMMIT)
-				die("invalid shallow object %s", oid_to_hex(&oid));
-			if (!(object->flags & CLIENT_SHALLOW)) {
-				object->flags |= CLIENT_SHALLOW;
-				add_object_array(object, NULL, &shallows);
-			}
+		if (process_shallow(line, &shallows))
 			continue;
-		}
-		if (skip_prefix(line, "deepen ", &arg)) {
-			char *end = NULL;
-			depth = strtol(arg, &end, 0);
-			if (!end || *end || depth <= 0)
-				die("Invalid deepen: %s", line);
+		if (process_deepen(line, &depth))
 			continue;
-		}
-		if (skip_prefix(line, "deepen-since ", &arg)) {
-			char *end = NULL;
-			deepen_since = parse_timestamp(arg, &end, 0);
-			if (!end || *end || !deepen_since ||
-			    /* revisions.c's max_age -1 is special */
-			    deepen_since == -1)
-				die("Invalid deepen-since: %s", line);
-			deepen_rev_list = 1;
+		if (process_deepen_since(line, &deepen_since, &deepen_rev_list))
 			continue;
-		}
-		if (skip_prefix(line, "deepen-not ", &arg)) {
-			char *ref = NULL;
-			struct object_id oid;
-			if (expand_ref(arg, strlen(arg), oid.hash, &ref) != 1)
-				die("git upload-pack: ambiguous deepen-not: %s", line);
-			string_list_append(&deepen_not, ref);
-			free(ref);
-			deepen_rev_list = 1;
+		if (process_deepen_not(line, &deepen_not, &deepen_rev_list))
 			continue;
-		}
+
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_oid_hex(arg, &oid_buf))
 			die("git upload-pack: protocol error, "
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 06/15] transport: use get_refs_via_connect to get refs
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (4 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 05/15] upload-pack: factor out processing lines Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-06 22:10     ` Junio C Hamano
  2017-12-04 23:58   ` [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader Brandon Williams
                     ` (8 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Remove code duplication and use the existing 'get_refs_via_connect()'
function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
'git_transport_push()'.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 transport.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/transport.c b/transport.c
index d75ff0514..7c969f285 100644
--- a/transport.c
+++ b/transport.c
@@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
 
-	if (!data->got_remote_heads) {
-		connect_setup(transport, 0);
-		get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
-				 NULL, &data->shallow);
-		data->got_remote_heads = 1;
-	}
+	if (!data->got_remote_heads)
+		refs_tmp = get_refs_via_connect(transport, 0);
 
 	refs = fetch_pack(&args, data->fd, data->conn,
 			  refs_tmp ? refs_tmp : transport->remote_refs,
@@ -542,14 +538,8 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
 	struct send_pack_args args;
 	int ret;
 
-	if (!data->got_remote_heads) {
-		struct ref *tmp_refs;
-		connect_setup(transport, 1);
-
-		get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL,
-				 NULL, &data->shallow);
-		data->got_remote_heads = 1;
-	}
+	if (!data->got_remote_heads)
+		get_refs_via_connect(transport, 1);
 
 	memset(&args, 0, sizeof(args));
 	args.send_mirror = !!(flags & TRANSPORT_PUSH_MIRROR);
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (5 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 06/15] transport: use get_refs_via_connect to get refs Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-06 22:39     ` Junio C Hamano
  2017-12-04 23:58   ` [WIP 08/15] connect: discover protocol version outside of get_remote_heads Brandon Williams
                     ` (7 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

In order to allow for better control flow when protocol_v2 is introduced
convert 'get_remote_heads()' to use 'struct packet_reader' to read
packet lines.  This enables a client to be able to peek the first line
of a server's response (without consuming it) in order to determine the
protocol version its speaking and then passing control to the
appropriate handler.

This is needed because the initial response from a server speaking
protocol_v0 includes the first ref, while subsequent protocol versions
respond with a version line.  We want to be able to read this first line
without consuming the first ref sent in the protocol_v0 case so that the
protocol version the server is speaking can be determined outside of
'get_remote_heads()' in a future patch.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 connect.c | 125 +++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 70 insertions(+), 55 deletions(-)

diff --git a/connect.c b/connect.c
index 7fbd396b3..f79ea9179 100644
--- a/connect.c
+++ b/connect.c
@@ -48,6 +48,12 @@ int check_ref_type(const struct ref *ref, int flags)
 
 static void die_initial_contact(int unexpected)
 {
+	/*
+	 * A hang-up after seeing some response from the other end
+	 * means that it is unexpected, as we know the other end is
+	 * willing to talk to us.  A hang-up before seeing any
+	 * response does not necessarily mean an ACL problem, though.
+	 */
 	if (unexpected)
 		die(_("The remote end hung up upon initial contact"));
 	else
@@ -56,6 +62,41 @@ static void die_initial_contact(int unexpected)
 		      "and the repository exists."));
 }
 
+static enum protocol_version discover_version(struct packet_reader *reader)
+{
+	enum protocol_version version = protocol_unknown_version;
+
+	/*
+	 * Peek the first line of the server's response to
+	 * determine the protocol version the server is speaking.
+	 */
+	switch (packet_reader_peek(reader)) {
+	case PACKET_READ_ERROR:
+		die_initial_contact(0);
+	case PACKET_READ_FLUSH:
+	case PACKET_READ_DELIM:
+		version = protocol_v0;
+		break;
+	case PACKET_READ_NORMAL:
+		version = determine_protocol_version_client(reader->line);
+		break;
+	}
+
+	/* Maybe process capabilities here, at least for v2 */
+	switch (version) {
+	case protocol_v1:
+		/* Read the peeked version line */
+		packet_reader_read(reader);
+		break;
+	case protocol_v0:
+		break;
+	case protocol_unknown_version:
+		BUG("ERROR");
+	}
+
+	return version;
+}
+
 static void parse_one_symref_info(struct string_list *symref, const char *val, int len)
 {
 	char *sym, *target;
@@ -109,44 +150,10 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 	string_list_clear(&symref, 0);
 }
 
-/*
- * Read one line of a server's ref advertisement into packet_buffer.
- */
-static int read_remote_ref(int in, char **src_buf, size_t *src_len,
-			   int *responded)
-{
-	int len = packet_read(in, src_buf, src_len,
-			      packet_buffer, sizeof(packet_buffer),
-			      PACKET_READ_GENTLE_ON_EOF |
-			      PACKET_READ_CHOMP_NEWLINE);
-	const char *arg;
-	if (len < 0)
-		die_initial_contact(*responded);
-	if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
-		die("remote error: %s", arg);
-
-	*responded = 1;
-
-	return len;
-}
-
-#define EXPECTING_PROTOCOL_VERSION 0
-#define EXPECTING_FIRST_REF 1
-#define EXPECTING_REF 2
-#define EXPECTING_SHALLOW 3
-
-/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
-static int process_protocol_version(void)
-{
-	switch (determine_protocol_version_client(packet_buffer)) {
-	case protocol_v1:
-		return 1;
-	case protocol_v0:
-		return 0;
-	default:
-		die("server is speaking an unknown protocol");
-	}
-}
+#define EXPECTING_FIRST_REF 0
+#define EXPECTING_REF 1
+#define EXPECTING_SHALLOW 2
+#define EXPECTING_DONE 3
 
 static void process_capabilities(int *len)
 {
@@ -230,28 +237,34 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
 			      struct oid_array *shallow_points)
 {
 	struct ref **orig_list = list;
+	int len = 0;
+	int state = EXPECTING_FIRST_REF;
+	struct packet_reader reader;
+	const char *arg;
 
-	/*
-	 * A hang-up after seeing some response from the other end
-	 * means that it is unexpected, as we know the other end is
-	 * willing to talk to us.  A hang-up before seeing any
-	 * response does not necessarily mean an ACL problem, though.
-	 */
-	int responded = 0;
-	int len;
-	int state = EXPECTING_PROTOCOL_VERSION;
+	packet_reader_init(&reader, in, src_buf, src_len);
+
+	discover_version(&reader);
 
 	*list = NULL;
 
-	while ((len = read_remote_ref(in, &src_buf, &src_len, &responded))) {
+	while (state != EXPECTING_DONE) {
+		switch (packet_reader_read(&reader)) {
+		case PACKET_READ_ERROR:
+			die_initial_contact(1);
+		case PACKET_READ_NORMAL:
+			len = reader.pktlen;
+			if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
+				die("remote error: %s", arg);
+			break;
+		case PACKET_READ_FLUSH:
+			state = EXPECTING_DONE;
+			break;
+		case PACKET_READ_DELIM:
+			die("invalid packet\n");
+		}
+
 		switch (state) {
-		case EXPECTING_PROTOCOL_VERSION:
-			if (process_protocol_version()) {
-				state = EXPECTING_FIRST_REF;
-				break;
-			}
-			state = EXPECTING_FIRST_REF;
-			/* fallthrough */
 		case EXPECTING_FIRST_REF:
 			process_capabilities(&len);
 			if (process_dummy_ref()) {
@@ -269,6 +282,8 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
 			if (process_shallow(len, shallow_points))
 				break;
 			die("protocol error: unexpected '%s'", packet_buffer);
+		case EXPECTING_DONE:
+			break;
 		default:
 			die("unexpected state %d", state);
 		}
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 08/15] connect: discover protocol version outside of get_remote_heads
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (6 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-07 18:50     ` Junio C Hamano
  2017-12-04 23:58   ` [WIP 09/15] transport: store protocol version Brandon Williams
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

In order to prepare for the addition of protocol_v2 push the protocol
version discovery outside of 'get_remote_heads()'.  This will allow for
keeping the logic for processing the reference advertisement for
protocol_v1 and protocol_v0 separate from the logic for protocol_v2.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/fetch-pack.c | 14 +++++++++++++-
 builtin/send-pack.c  | 15 +++++++++++++--
 connect.c            | 13 ++++---------
 connect.h            |  3 +++
 remote-curl.c        | 18 ++++++++++++++++--
 remote.h             |  5 +++--
 transport.c          | 22 +++++++++++++++++-----
 7 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f..4873e9572 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -4,6 +4,7 @@
 #include "remote.h"
 #include "connect.h"
 #include "sha1-array.h"
+#include "protocol.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -52,6 +53,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	struct fetch_pack_args args;
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct string_list deepen_not = STRING_LIST_INIT_DUP;
+	struct packet_reader reader;
 
 	packet_trace_identity("fetch-pack");
 
@@ -193,7 +195,17 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 		if (!conn)
 			return args.diag_url ? 0 : 1;
 	}
-	get_remote_heads(fd[0], NULL, 0, &ref, 0, NULL, &shallow);
+
+	packet_reader_init(&reader, fd[0], NULL, 0);
+
+	switch (discover_version(&reader)) {
+	case protocol_v1:
+	case protocol_v0:
+		get_remote_heads(&reader, &ref, 0, NULL, &shallow);
+		break;
+	case protocol_unknown_version:
+		BUG("unknown protocol version");
+	}
 
 	ref = fetch_pack(&args, fd, conn, ref, dest, sought, nr_sought,
 			 &shallow, pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index fc4f0bb5f..9c2ca80c8 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -14,6 +14,7 @@
 #include "sha1-array.h"
 #include "gpg-interface.h"
 #include "gettext.h"
+#include "protocol.h"
 
 static const char * const send_pack_usage[] = {
 	N_("git send-pack [--all | --mirror] [--dry-run] [--force] "
@@ -154,6 +155,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 	int progress = -1;
 	int from_stdin = 0;
 	struct push_cas_option cas = {0};
+	struct packet_reader reader;
 
 	struct option options[] = {
 		OPT__VERBOSITY(&verbose),
@@ -256,8 +258,17 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 			args.verbose ? CONNECT_VERBOSE : 0);
 	}
 
-	get_remote_heads(fd[0], NULL, 0, &remote_refs, REF_NORMAL,
-			 &extra_have, &shallow);
+	packet_reader_init(&reader, fd[0], NULL, 0);
+
+	switch (discover_version(&reader)) {
+	case protocol_v1:
+	case protocol_v0:
+		get_remote_heads(&reader, &remote_refs, REF_NORMAL,
+				 &extra_have, &shallow);
+		break;
+	case protocol_unknown_version:
+		BUG("unknown protocol version");
+	}
 
 	transport_verify_remote_names(nr_refspecs, refspecs);
 
diff --git a/connect.c b/connect.c
index f79ea9179..5f7cf05c7 100644
--- a/connect.c
+++ b/connect.c
@@ -62,7 +62,7 @@ static void die_initial_contact(int unexpected)
 		      "and the repository exists."));
 }
 
-static enum protocol_version discover_version(struct packet_reader *reader)
+enum protocol_version discover_version(struct packet_reader *reader)
 {
 	enum protocol_version version = protocol_unknown_version;
 
@@ -231,7 +231,7 @@ static int process_shallow(int len, struct oid_array *shallow_points)
 /*
  * Read all the refs from the other end
  */
-struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+struct ref **get_remote_heads(struct packet_reader *reader,
 			      struct ref **list, unsigned int flags,
 			      struct oid_array *extra_have,
 			      struct oid_array *shallow_points)
@@ -239,21 +239,16 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
 	struct ref **orig_list = list;
 	int len = 0;
 	int state = EXPECTING_FIRST_REF;
-	struct packet_reader reader;
 	const char *arg;
 
-	packet_reader_init(&reader, in, src_buf, src_len);
-
-	discover_version(&reader);
-
 	*list = NULL;
 
 	while (state != EXPECTING_DONE) {
-		switch (packet_reader_read(&reader)) {
+		switch (packet_reader_read(reader)) {
 		case PACKET_READ_ERROR:
 			die_initial_contact(1);
 		case PACKET_READ_NORMAL:
-			len = reader.pktlen;
+			len = reader->pktlen;
 			if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
 				die("remote error: %s", arg);
 			break;
diff --git a/connect.h b/connect.h
index 01f14cdf3..cdb8979dc 100644
--- a/connect.h
+++ b/connect.h
@@ -13,4 +13,7 @@ extern int parse_feature_request(const char *features, const char *feature);
 extern const char *server_feature_value(const char *feature, int *len_ret);
 extern int url_is_local_not_ssh(const char *url);
 
+struct packet_reader;
+extern enum protocol_version discover_version(struct packet_reader *reader);
+
 #endif
diff --git a/remote-curl.c b/remote-curl.c
index 0053b0954..74c6c3049 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "config.h"
 #include "remote.h"
+#include "connect.h"
 #include "strbuf.h"
 #include "walker.h"
 #include "http.h"
@@ -13,6 +14,7 @@
 #include "credential.h"
 #include "sha1-array.h"
 #include "send-pack.h"
+#include "protocol.h"
 
 static struct remote *remote;
 /* always ends with a trailing slash */
@@ -176,8 +178,20 @@ static struct discovery *last_discovery;
 static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 {
 	struct ref *list = NULL;
-	get_remote_heads(-1, heads->buf, heads->len, &list,
-			 for_push ? REF_NORMAL : 0, NULL, &heads->shallow);
+	struct packet_reader reader;
+
+	packet_reader_init(&reader, -1, heads->buf, heads->len);
+
+	switch (discover_version(&reader)) {
+	case protocol_v1:
+	case protocol_v0:
+		get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
+				 NULL, &heads->shallow);
+		break;
+	case protocol_unknown_version:
+		BUG("unknown protocol version");
+	}
+
 	return list;
 }
 
diff --git a/remote.h b/remote.h
index 2ecf4c8c7..af34e44a4 100644
--- a/remote.h
+++ b/remote.h
@@ -150,10 +150,11 @@ int check_ref_type(const struct ref *ref, int flags);
 void free_refs(struct ref *ref);
 
 struct oid_array;
-extern struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+struct packet_reader;
+extern struct ref **get_remote_heads(struct packet_reader *reader,
 				     struct ref **list, unsigned int flags,
 				     struct oid_array *extra_have,
-				     struct oid_array *shallow);
+				     struct oid_array *shallow_points);
 
 int resolve_remote_symref(struct ref *ref, struct ref *list);
 int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
diff --git a/transport.c b/transport.c
index 7c969f285..28b744ec2 100644
--- a/transport.c
+++ b/transport.c
@@ -17,6 +17,7 @@
 #include "string-list.h"
 #include "sha1-array.h"
 #include "sigchain.h"
+#include "protocol.h"
 
 static void set_upstreams(struct transport *transport, struct ref *refs,
 	int pretend)
@@ -190,13 +191,24 @@ static int connect_setup(struct transport *transport, int for_push)
 static struct ref *get_refs_via_connect(struct transport *transport, int for_push)
 {
 	struct git_transport_data *data = transport->data;
-	struct ref *refs;
+	struct ref *refs = NULL;
+	struct packet_reader reader;
 
 	connect_setup(transport, for_push);
-	get_remote_heads(data->fd[0], NULL, 0, &refs,
-			 for_push ? REF_NORMAL : 0,
-			 &data->extra_have,
-			 &data->shallow);
+
+	packet_reader_init(&reader, data->fd[0], NULL, 0);
+
+	switch (discover_version(&reader)) {
+	case protocol_v1:
+	case protocol_v0:
+		get_remote_heads(&reader, &refs,
+				 for_push ? REF_NORMAL : 0,
+				 &data->extra_have,
+				 &data->shallow);
+		break;
+	case protocol_unknown_version:
+		BUG("unknown protocol version");
+	}
 	data->got_remote_heads = 1;
 
 	return refs;
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 09/15] transport: store protocol version
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (7 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 08/15] connect: discover protocol version outside of get_remote_heads Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-04 23:58   ` [WIP 10/15] protocol: introduce enum protocol_version value protocol_v2 Brandon Williams
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Once protocol_v2 is introduced requesting a fetch or a push will need to
be handled differently depending on the protocol version.  Store the
protocol version the serving is speaking in 'struct git_transport_data'
and use it to determine what to do in the case of a fetch or a push.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 transport.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/transport.c b/transport.c
index 28b744ec2..4160c4167 100644
--- a/transport.c
+++ b/transport.c
@@ -118,6 +118,7 @@ struct git_transport_data {
 	struct child_process *conn;
 	int fd[2];
 	unsigned got_remote_heads : 1;
+	enum protocol_version version;
 	struct oid_array extra_have;
 	struct oid_array shallow;
 };
@@ -198,7 +199,8 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 
 	packet_reader_init(&reader, data->fd[0], NULL, 0);
 
-	switch (discover_version(&reader)) {
+	data->version = discover_version(&reader);
+	switch (data->version) {
 	case protocol_v1:
 	case protocol_v0:
 		get_remote_heads(&reader, &refs,
@@ -219,7 +221,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 {
 	int ret = 0;
 	struct git_transport_data *data = transport->data;
-	struct ref *refs;
+	struct ref *refs = NULL;
 	char *dest = xstrdup(transport->url);
 	struct fetch_pack_args args;
 	struct ref *refs_tmp = NULL;
@@ -245,10 +247,18 @@ static int fetch_refs_via_pack(struct transport *transport,
 	if (!data->got_remote_heads)
 		refs_tmp = get_refs_via_connect(transport, 0);
 
-	refs = fetch_pack(&args, data->fd, data->conn,
-			  refs_tmp ? refs_tmp : transport->remote_refs,
-			  dest, to_fetch, nr_heads, &data->shallow,
-			  &transport->pack_lockfile);
+	switch (data->version) {
+	case protocol_v1:
+	case protocol_v0:
+		refs = fetch_pack(&args, data->fd, data->conn,
+				  refs_tmp ? refs_tmp : transport->remote_refs,
+				  dest, to_fetch, nr_heads, &data->shallow,
+				  &transport->pack_lockfile);
+		break;
+	case protocol_unknown_version:
+		BUG("unknown protocol version");
+	}
+
 	close(data->fd[0]);
 	close(data->fd[1]);
 	if (finish_connect(data->conn))
@@ -548,7 +558,7 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
 {
 	struct git_transport_data *data = transport->data;
 	struct send_pack_args args;
-	int ret;
+	int ret = 0;
 
 	if (!data->got_remote_heads)
 		get_refs_via_connect(transport, 1);
@@ -573,8 +583,15 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
 	else
 		args.push_cert = SEND_PACK_PUSH_CERT_NEVER;
 
-	ret = send_pack(&args, data->fd, data->conn, remote_refs,
-			&data->extra_have);
+	switch (data->version) {
+	case protocol_v1:
+	case protocol_v0:
+		ret = send_pack(&args, data->fd, data->conn, remote_refs,
+				&data->extra_have);
+		break;
+	case protocol_unknown_version:
+		BUG("unknown protocol version");
+	}
 
 	close(data->fd[1]);
 	close(data->fd[0]);
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 10/15] protocol: introduce enum protocol_version value protocol_v2
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (8 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 09/15] transport: store protocol version Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-04 23:58   ` [WIP 11/15] serve: introduce git-serve Brandon Williams
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Introduce protocol_v2, a new value for 'enum protocol_version'.
Subsequent patches will fill in the implementation of protocol_v2.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/fetch-pack.c   | 3 +++
 builtin/receive-pack.c | 3 +++
 builtin/send-pack.c    | 3 +++
 connect.c              | 3 +++
 protocol.c             | 2 ++
 protocol.h             | 1 +
 remote-curl.c          | 3 +++
 transport.c            | 9 +++++++++
 upload-pack.c          | 3 +++
 9 files changed, 30 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 4873e9572..061c278b4 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -199,6 +199,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	packet_reader_init(&reader, fd[0], NULL, 0);
 
 	switch (discover_version(&reader)) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 	case protocol_v0:
 		get_remote_heads(&reader, &ref, 0, NULL, &shallow);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 839c1462d..4e141d521 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1965,6 +1965,9 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		unpack_limit = receive_unpack_limit;
 
 	switch (determine_protocol_version_server()) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 		/*
 		 * v1 is just the original protocol with a version string,
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 9c2ca80c8..9441f1eed 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -261,6 +261,9 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 	packet_reader_init(&reader, fd[0], NULL, 0);
 
 	switch (discover_version(&reader)) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 	case protocol_v0:
 		get_remote_heads(&reader, &remote_refs, REF_NORMAL,
diff --git a/connect.c b/connect.c
index 5f7cf05c7..433f08649 100644
--- a/connect.c
+++ b/connect.c
@@ -84,6 +84,9 @@ enum protocol_version discover_version(struct packet_reader *reader)
 
 	/* Maybe process capabilities here, at least for v2 */
 	switch (version) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 		/* Read the peeked version line */
 		packet_reader_read(reader);
diff --git a/protocol.c b/protocol.c
index 43012b7eb..5e636785d 100644
--- a/protocol.c
+++ b/protocol.c
@@ -8,6 +8,8 @@ static enum protocol_version parse_protocol_version(const char *value)
 		return protocol_v0;
 	else if (!strcmp(value, "1"))
 		return protocol_v1;
+	else if (!strcmp(value, "2"))
+		return protocol_v2;
 	else
 		return protocol_unknown_version;
 }
diff --git a/protocol.h b/protocol.h
index 1b2bc94a8..2ad35e433 100644
--- a/protocol.h
+++ b/protocol.h
@@ -5,6 +5,7 @@ enum protocol_version {
 	protocol_unknown_version = -1,
 	protocol_v0 = 0,
 	protocol_v1 = 1,
+	protocol_v2 = 2,
 };
 
 /*
diff --git a/remote-curl.c b/remote-curl.c
index 74c6c3049..abb6e2ac1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -183,6 +183,9 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	packet_reader_init(&reader, -1, heads->buf, heads->len);
 
 	switch (discover_version(&reader)) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 	case protocol_v0:
 		get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
diff --git a/transport.c b/transport.c
index 4160c4167..8a3735cf5 100644
--- a/transport.c
+++ b/transport.c
@@ -201,6 +201,9 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 
 	data->version = discover_version(&reader);
 	switch (data->version) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 	case protocol_v0:
 		get_remote_heads(&reader, &refs,
@@ -248,6 +251,9 @@ static int fetch_refs_via_pack(struct transport *transport,
 		refs_tmp = get_refs_via_connect(transport, 0);
 
 	switch (data->version) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 	case protocol_v0:
 		refs = fetch_pack(&args, data->fd, data->conn,
@@ -584,6 +590,9 @@ static int git_transport_push(struct transport *transport, struct ref *remote_re
 		args.push_cert = SEND_PACK_PUSH_CERT_NEVER;
 
 	switch (data->version) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 	case protocol_v0:
 		ret = send_pack(&args, data->fd, data->conn, remote_refs,
diff --git a/upload-pack.c b/upload-pack.c
index d2711e4ee..d706175e4 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1105,6 +1105,9 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix)
 	git_config(upload_pack_config, NULL);
 
 	switch (determine_protocol_version_server()) {
+	case protocol_v2:
+		die("support for protocol v2 not implemented yet");
+		break;
 	case protocol_v1:
 		/*
 		 * v1 is just the original protocol with a version string,
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 11/15] serve: introduce git-serve
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (9 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 10/15] protocol: introduce enum protocol_version value protocol_v2 Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-07 23:42     ` Junio C Hamano
  2017-12-04 23:58   ` [WIP 12/15] ls-refs: introduce ls-refs server command Brandon Williams
                     ` (3 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Introduce git-serve, the base server for protocol version 2.

When connecting to a server supporting protocol version 2, the server
will send a list all of its capabilities and then wait for the client to
send a command request.  Some capabilities advertised are 'commands'
which the client can request (push and fetch are examples of such
commands).  A command request is comprised of a list of capabilities,
including a command request "command=<command>", a delimiter packet,
followed by a list of parameters for the requested command.

At the end of each command a client can request that another command be
executed or can terminate the connection by sending a flush packet.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 .gitignore      |   1 +
 Makefile        |   2 +
 builtin.h       |   1 +
 builtin/serve.c |  25 ++++++++
 git.c           |   1 +
 serve.c         | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 serve.h         |   6 ++
 7 files changed, 221 insertions(+)
 create mode 100644 builtin/serve.c
 create mode 100644 serve.c
 create mode 100644 serve.h

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..2d0450c26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
 /git-rm
 /git-send-email
 /git-send-pack
+/git-serve
 /git-sh-i18n
 /git-sh-i18n--envsubst
 /git-sh-setup
diff --git a/Makefile b/Makefile
index 86394b69d..710672cf4 100644
--- a/Makefile
+++ b/Makefile
@@ -862,6 +862,7 @@ LIB_OBJS += revision.o
 LIB_OBJS += run-command.o
 LIB_OBJS += send-pack.o
 LIB_OBJS += sequencer.o
+LIB_OBJS += serve.o
 LIB_OBJS += server-info.o
 LIB_OBJS += setup.o
 LIB_OBJS += sha1-array.o
@@ -995,6 +996,7 @@ BUILTIN_OBJS += builtin/rev-parse.o
 BUILTIN_OBJS += builtin/revert.o
 BUILTIN_OBJS += builtin/rm.o
 BUILTIN_OBJS += builtin/send-pack.o
+BUILTIN_OBJS += builtin/serve.o
 BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-ref.o
diff --git a/builtin.h b/builtin.h
index f332a1257..3f3fdfc28 100644
--- a/builtin.h
+++ b/builtin.h
@@ -215,6 +215,7 @@ extern int cmd_rev_parse(int argc, const char **argv, const char *prefix);
 extern int cmd_revert(int argc, const char **argv, const char *prefix);
 extern int cmd_rm(int argc, const char **argv, const char *prefix);
 extern int cmd_send_pack(int argc, const char **argv, const char *prefix);
+extern int cmd_serve(int argc, const char **argv, const char *prefix);
 extern int cmd_shortlog(int argc, const char **argv, const char *prefix);
 extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
diff --git a/builtin/serve.c b/builtin/serve.c
new file mode 100644
index 000000000..2ecaad3b6
--- /dev/null
+++ b/builtin/serve.c
@@ -0,0 +1,25 @@
+#include "cache.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "serve.h"
+
+static char const * const grep_usage[] = {
+	N_("git serve [<options>]"),
+	NULL
+};
+
+int cmd_serve(int argc, const char **argv, const char *prefix)
+{
+
+	struct option options[] = {
+		OPT_END()
+	};
+
+	/* ignore all unknown cmdline switches for now */
+	argc = parse_options(argc, argv, prefix, options, grep_usage,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN);
+	serve();
+
+	return 0;
+}
diff --git a/git.c b/git.c
index e32e16f2d..527086eaf 100644
--- a/git.c
+++ b/git.c
@@ -457,6 +457,7 @@ static struct cmd_struct commands[] = {
 	{ "revert", cmd_revert, RUN_SETUP | NEED_WORK_TREE },
 	{ "rm", cmd_rm, RUN_SETUP },
 	{ "send-pack", cmd_send_pack, RUN_SETUP },
+	{ "serve", cmd_serve, RUN_SETUP },
 	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
 	{ "show", cmd_show, RUN_SETUP },
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
diff --git a/serve.c b/serve.c
new file mode 100644
index 000000000..476e73b54
--- /dev/null
+++ b/serve.c
@@ -0,0 +1,185 @@
+#include "cache.h"
+#include "repository.h"
+#include "config.h"
+#include "pkt-line.h"
+#include "version.h"
+#include "argv-array.h"
+#include "serve.h"
+
+static int agent_advertise(struct repository *r,
+			   struct strbuf *value)
+{
+	strbuf_addstr(value, git_user_agent_sanitized());
+	return 1;
+}
+
+struct protocol_capability {
+	const char *name;
+	int advertised; /* capability was advertised */
+	/* int advertise(struct strbuf *value, struct repository *r) */
+	int (*advertise)(struct repository *r, struct strbuf *value);
+	/* int command(struct repository *r, struct argv_array *keys, struct argv_array *args)*/
+	int (*command)(struct repository *r,
+		       struct argv_array *keys,
+		       struct argv_array *args);
+};
+
+static struct protocol_capability capabilities[] = {
+	{ "agent", 0, agent_advertise, NULL },
+};
+
+static void advertise_capabilities(void)
+{
+	struct strbuf capability = STRBUF_INIT;
+	struct strbuf value = STRBUF_INIT;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
+		struct protocol_capability *c = &capabilities[i];
+
+		c->advertised = c->advertise(the_repository, &value);
+		if (c->advertised) {
+			strbuf_addstr(&capability, c->name);
+
+			if (value.len) {
+				strbuf_addch(&capability, '=');
+				strbuf_addbuf(&capability, &value);
+			}
+
+			strbuf_addch(&capability, '\n');
+			packet_write(1, capability.buf, capability.len);
+		}
+
+		strbuf_reset(&capability);
+		strbuf_reset(&value);
+	}
+
+	packet_flush(1);
+	strbuf_release(&capability);
+	strbuf_release(&value);
+}
+
+static struct protocol_capability *get_capability(const char *key)
+{
+	int i;
+
+	if (!key)
+		return NULL;
+
+	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
+		struct protocol_capability *c = &capabilities[i];
+		const char *out;
+		if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
+			return c;
+	}
+
+	return NULL;
+}
+
+static int is_valid_capability(const char *key)
+{
+	const struct protocol_capability *c = get_capability(key);
+
+	return c && c->advertised;
+}
+
+static int is_command(const char *key, struct protocol_capability **command)
+{
+	const char *out;
+
+	if (skip_prefix(key, "command=", &out)) {
+		struct protocol_capability *cmd = get_capability(out);
+
+		if (!cmd || !cmd->advertised || !cmd->command)
+			die("invalid cmd '%s'", out);
+		if (*command)
+			die("command already requested");
+
+		*command = cmd;
+		return 1;
+	}
+
+	return 0;
+}
+
+#define PROCESS_REQUEST_KEYS 0
+#define PROCESS_REQUEST_ARGS 1
+#define PROCESS_REQUEST_DONE 2
+
+static int process_request(void)
+{
+	int state = PROCESS_REQUEST_KEYS;
+	struct packet_reader reader;
+	struct argv_array keys = ARGV_ARRAY_INIT;
+	struct argv_array args = ARGV_ARRAY_INIT;
+	struct protocol_capability *command = NULL;
+
+	packet_reader_init(&reader, 0, NULL, 0);
+
+	while (state != PROCESS_REQUEST_DONE) {
+		switch (packet_reader_read(&reader)) {
+		case PACKET_READ_ERROR:
+			BUG("invalid state");
+		case PACKET_READ_NORMAL:
+			break;
+		case PACKET_READ_FLUSH:
+			state = PROCESS_REQUEST_DONE;
+			continue;
+		case PACKET_READ_DELIM:
+			if (state != PROCESS_REQUEST_KEYS)
+				die("protocol error");
+			state = PROCESS_REQUEST_ARGS;
+			/*
+			 * maybe include a check to make sure that a
+			 * command/capabilities were given.
+			 */
+			continue;
+		}
+
+		switch (state) {
+		case PROCESS_REQUEST_KEYS:
+			/* collect request; a sequence of keys and values */
+			if (is_command(reader.line, &command) ||
+			    is_valid_capability(reader.line))
+				argv_array_push(&keys, reader.line);
+			break;
+		case PROCESS_REQUEST_ARGS:
+			/* collect arguments for the requested command */
+			argv_array_push(&args, reader.line);
+			break;
+		case PROCESS_REQUEST_DONE:
+			continue;
+		default:
+			BUG("invalid state");
+		}
+	}
+
+	/*
+	 * If no command and no keys were given then the client wanted to
+	 * terminate the connection.
+	 */
+	if (!keys.argc && !args.argc)
+		return 1;
+
+	if (!command)
+		die("no command requested");
+
+	command->command(the_repository, &keys, &args);
+
+	argv_array_clear(&keys);
+	argv_array_clear(&args);
+	return 0;
+}
+
+/* Main serve loop for protocol version 2 */
+void serve(void)
+{
+	/* serve by default supports v2 */
+	packet_write_fmt(1, "version 2\n");
+
+	advertise_capabilities();
+
+	for (;;)
+		if (process_request())
+			break;
+}
diff --git a/serve.h b/serve.h
new file mode 100644
index 000000000..1ed9685ca
--- /dev/null
+++ b/serve.h
@@ -0,0 +1,6 @@
+#ifndef SERVE_H
+#define SERVE_H
+
+extern void serve(void);
+
+#endif /* SERVE_H */
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 12/15] ls-refs: introduce ls-refs server command
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (10 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 11/15] serve: introduce git-serve Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-13 16:30     ` Philip Oakley
  2017-12-04 23:58   ` [WIP 13/15] connect: request remote refs using v2 Brandon Williams
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Introduce the ls-refs server command.  In protocol v2, the ls-refs
command is used to request the ref advertisement from the server.  Since
it is a command which can be requested (as opposed to manditory in v1),
a clinet can sent a number of parameters in its request to limit the ref
advertisement based on provided ref-patterns.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Makefile  |  1 +
 ls-refs.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ls-refs.h |  9 ++++++
 serve.c   |  8 ++++++
 4 files changed, 114 insertions(+)
 create mode 100644 ls-refs.c
 create mode 100644 ls-refs.h

diff --git a/Makefile b/Makefile
index 710672cf4..be3c2f98b 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ LIB_OBJS += list-objects.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
diff --git a/ls-refs.c b/ls-refs.c
new file mode 100644
index 000000000..591dd105d
--- /dev/null
+++ b/ls-refs.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "repository.h"
+#include "refs.h"
+#include "remote.h"
+#include "argv-array.h"
+#include "ls-refs.h"
+#include "pkt-line.h"
+
+struct ls_refs_data {
+	unsigned peel;
+	unsigned symrefs;
+	struct argv_array patterns;
+};
+
+/*
+ * Is there one among the list of patterns that match the tail part
+ * of the path?
+ */
+static int tail_match(const char **pattern, const char *path)
+{
+	const char *p;
+	char *pathbuf;
+
+	if (!pattern)
+		return 1; /* no restriction */
+
+	pathbuf = xstrfmt("/%s", path);
+	while ((p = *(pattern++)) != NULL) {
+		if (!wildmatch(p, pathbuf, 0)) {
+			free(pathbuf);
+			return 1;
+		}
+	}
+	free(pathbuf);
+	return 0;
+}
+
+static int send_ref(const char *refname, const struct object_id *oid,
+		    int flag, void *cb_data)
+{
+	struct ls_refs_data *data = cb_data;
+	const char *refname_nons = strip_namespace(refname);
+	struct strbuf refline = STRBUF_INIT;
+
+	if (data->patterns.argc && !tail_match(data->patterns.argv, refname))
+		return 0;
+
+	strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+	if (data->symrefs && flag & REF_ISSYMREF) {
+		struct object_id unused;
+		const char *symref_target = resolve_ref_unsafe(refname, 0,
+							       unused.hash,
+							       &flag);
+
+		if (!symref_target)
+			die("'%s' is a symref but it is not?", refname);
+
+		strbuf_addf(&refline, " %s", symref_target);
+	}
+
+	strbuf_addch(&refline, '\n');
+
+	packet_write(1, refline.buf, refline.len);
+	if (data->peel) {
+		struct object_id peeled;
+		if (!peel_ref(refname, peeled.hash))
+			packet_write_fmt(1, "%s %s^{}\n", oid_to_hex(&peeled),
+					 refname_nons);
+	}
+
+	strbuf_release(&refline);
+	return 0;
+}
+
+int ls_refs(struct repository *r, struct argv_array *keys, struct argv_array *args)
+{
+	int i;
+	struct ls_refs_data data = { 0, 0, ARGV_ARRAY_INIT };
+
+	for (i = 0; i < args->argc; i++) {
+		if (!strcmp("--peeled", args->argv[i]))
+			data.peel = 1;
+		else if (!strcmp("--symrefs", args->argv[i]))
+			data.symrefs = 1;
+		else
+			/* Pattern */
+			argv_array_pushf(&data.patterns, "*/%s", args->argv[i]);
+
+	}
+
+	head_ref_namespaced(send_ref, &data);
+	for_each_namespaced_ref(send_ref, &data);
+	packet_flush(1);
+	argv_array_clear(&data.patterns);
+	return 0;
+}
diff --git a/ls-refs.h b/ls-refs.h
new file mode 100644
index 000000000..9e4c57bfe
--- /dev/null
+++ b/ls-refs.h
@@ -0,0 +1,9 @@
+#ifndef LS_REFS_H
+#define LS_REFS_H
+
+struct repository;
+struct argv_array;
+extern int ls_refs(struct repository *r, struct argv_array *keys,
+		   struct argv_array *args);
+
+#endif /* LS_REFS_H */
diff --git a/serve.c b/serve.c
index 476e73b54..36f77c365 100644
--- a/serve.c
+++ b/serve.c
@@ -4,8 +4,15 @@
 #include "pkt-line.h"
 #include "version.h"
 #include "argv-array.h"
+#include "ls-refs.h"
 #include "serve.h"
 
+static int always_advertise(struct repository *r,
+			    struct strbuf *value)
+{
+	return 1;
+}
+
 static int agent_advertise(struct repository *r,
 			   struct strbuf *value)
 {
@@ -26,6 +33,7 @@ struct protocol_capability {
 
 static struct protocol_capability capabilities[] = {
 	{ "agent", 0, agent_advertise, NULL },
+	{ "ls-refs", 0, always_advertise, ls_refs },
 };
 
 static void advertise_capabilities(void)
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 13/15] connect: request remote refs using v2
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (11 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 12/15] ls-refs: introduce ls-refs server command Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-04 23:58   ` [WIP 14/15] upload_pack: introduce fetch server command Brandon Williams
  2017-12-04 23:58   ` [WIP 15/15] fetch-pack: perform a fetch using v2 Brandon Williams
  14 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Teach the client to be able to request a remote's refs using protocol
v2.  This is done by having a client issue a 'ls-refs' request to a v2
server.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 connect.c              | 85 +++++++++++++++++++++++++++++++++++++++++++++++++-
 remote.h               |  2 ++
 t/t5701-protocol-v2.sh | 28 +++++++++++++++++
 transport.c            |  2 +-
 upload-pack.c          |  3 +-
 5 files changed, 117 insertions(+), 3 deletions(-)
 create mode 100755 t/t5701-protocol-v2.sh

diff --git a/connect.c b/connect.c
index 433f08649..b3c933fc1 100644
--- a/connect.c
+++ b/connect.c
@@ -15,6 +15,7 @@
 #include "protocol.h"
 
 static char *server_capabilities;
+static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
 static const char *parse_feature_value(const char *, const char *, int *);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -62,6 +63,30 @@ static void die_initial_contact(int unexpected)
 		      "and the repository exists."));
 }
 
+static int server_supports_v2(const char *c)
+{
+	int i;
+
+	for (i = 0; i < server_capabilities_v2.argc; i++) {
+		const char *out;
+		if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
+		    (!*out || *out == '='))
+			return 1;
+	}
+
+	return 0;
+}
+
+static void process_capabilities_v2(struct packet_reader *reader)
+{
+	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+		argv_array_push(&server_capabilities_v2, reader->line);
+	}
+
+	if (reader->status != PACKET_READ_FLUSH)
+		die("protocol error");
+}
+
 enum protocol_version discover_version(struct packet_reader *reader)
 {
 	enum protocol_version version = protocol_unknown_version;
@@ -85,7 +110,7 @@ enum protocol_version discover_version(struct packet_reader *reader)
 	/* Maybe process capabilities here, at least for v2 */
 	switch (version) {
 	case protocol_v2:
-		die("support for protocol v2 not implemented yet");
+		process_capabilities_v2(reader);
 		break;
 	case protocol_v1:
 		/* Read the peeked version line */
@@ -292,6 +317,64 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 	return list;
 }
 
+static int process_ref_v2(const char *line, struct ref ***list)
+{
+	struct object_id old_oid;
+	const char *end_of_name;
+	struct ref *ref;
+
+	if (parse_oid_hex(line, &old_oid, &line))
+		return 0;
+	if (*line != ' ')
+		return 0;
+	line++;
+
+	end_of_name = strchr(line, ' ');
+
+	if (!end_of_name)
+		ref = alloc_ref(line);
+	else {
+		struct strbuf name = STRBUF_INIT;
+		/* symref info */
+		strbuf_add(&name, line, end_of_name - line);
+		ref = alloc_ref(name.buf);
+		ref->symref = xstrdup(end_of_name + 1);
+
+		strbuf_release(&name);
+	}
+
+	oidcpy(&ref->old_oid, &old_oid);
+	**list = ref;
+	*list = &ref->next;
+
+	return 1;
+}
+struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
+			     struct ref **list, unsigned int flags)
+{
+	*list = NULL;
+	/* Check that the server supports the ls-refs command */
+	if (!server_supports_v2("ls-refs"))
+		die("server doesn't support 'ls-refs' command");
+
+	/* Issue request for ls-refs */
+	packet_write_fmt(fd_out, "command=ls-refs");
+	packet_delim(fd_out);
+	packet_write_fmt(fd_out, "--symrefs");
+	packet_flush(fd_out);
+
+	/* Process response from server */
+	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+		if (!process_ref_v2(reader->line, &list))
+			die("invalid ls-refs response: %s", reader->line);
+	}
+
+	if (reader->status != PACKET_READ_FLUSH)
+		die("protocol error");
+
+	return list;
+}
+
 static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp)
 {
 	int len;
diff --git a/remote.h b/remote.h
index af34e44a4..64d87bf5b 100644
--- a/remote.h
+++ b/remote.h
@@ -155,6 +155,8 @@ extern struct ref **get_remote_heads(struct packet_reader *reader,
 				     struct ref **list, unsigned int flags,
 				     struct oid_array *extra_have,
 				     struct oid_array *shallow_points);
+extern struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
+				    struct ref **list, unsigned int flags);
 
 int resolve_remote_symref(struct ref *ref, struct ref *list);
 int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
diff --git a/t/t5701-protocol-v2.sh b/t/t5701-protocol-v2.sh
new file mode 100755
index 000000000..4bf4d61ac
--- /dev/null
+++ b/t/t5701-protocol-v2.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='test git wire-protocol version 2'
+
+TEST_NO_CREATE_REPO=1
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'file://' transport
+#
+test_expect_success 'create repo to be served by file:// transport' '
+	git init file_parent &&
+	test_commit -C file_parent one
+'
+
+test_expect_success 'list refs with file:// using protocol v2' '
+	GIT_TRACE_PACKET=1 git -c protocol.version=2 \
+		ls-remote --symref "file://$(pwd)/file_parent" >actual 2>log &&
+
+	# Server responded using protocol v2
+	cat log &&
+	grep "git< version 2" log &&
+
+	git ls-remote --symref "file://$(pwd)/file_parent" >expect &&
+	test_cmp actual expect
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 8a3735cf5..ddd0d6174 100644
--- a/transport.c
+++ b/transport.c
@@ -202,7 +202,7 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 	data->version = discover_version(&reader);
 	switch (data->version) {
 	case protocol_v2:
-		die("support for protocol v2 not implemented yet");
+		get_remote_refs(data->fd[1], &reader, &refs, for_push ? REF_NORMAL : 0);
 		break;
 	case protocol_v1:
 	case protocol_v0:
diff --git a/upload-pack.c b/upload-pack.c
index d706175e4..3296831f8 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -19,6 +19,7 @@
 #include "argv-array.h"
 #include "prio-queue.h"
 #include "protocol.h"
+#include "serve.h"
 
 static const char * const upload_pack_usage[] = {
 	N_("git upload-pack [<options>] <dir>"),
@@ -1106,7 +1107,7 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix)
 
 	switch (determine_protocol_version_server()) {
 	case protocol_v2:
-		die("support for protocol v2 not implemented yet");
+		serve();
 		break;
 	case protocol_v1:
 		/*
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 14/15] upload_pack: introduce fetch server command
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (12 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 13/15] connect: request remote refs using v2 Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  2017-12-04 23:58   ` [WIP 15/15] fetch-pack: perform a fetch using v2 Brandon Williams
  14 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Introduce the 'fetch' server command.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 serve.c       |   2 +
 upload-pack.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 upload-pack.h |   9 +++
 3 files changed, 234 insertions(+)
 create mode 100644 upload-pack.h

diff --git a/serve.c b/serve.c
index 36f77c365..7823ef0d9 100644
--- a/serve.c
+++ b/serve.c
@@ -6,6 +6,7 @@
 #include "argv-array.h"
 #include "ls-refs.h"
 #include "serve.h"
+#include "upload-pack.h"
 
 static int always_advertise(struct repository *r,
 			    struct strbuf *value)
@@ -34,6 +35,7 @@ struct protocol_capability {
 static struct protocol_capability capabilities[] = {
 	{ "agent", 0, agent_advertise, NULL },
 	{ "ls-refs", 0, always_advertise, ls_refs },
+	{ "fetch", 0, always_advertise, upload_pack_v2 },
 };
 
 static void advertise_capabilities(void)
diff --git a/upload-pack.c b/upload-pack.c
index 3296831f8..dc4421ab3 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -20,6 +20,7 @@
 #include "prio-queue.h"
 #include "protocol.h"
 #include "serve.h"
+#include "upload-pack.h"
 
 static const char * const upload_pack_usage[] = {
 	N_("git upload-pack [<options>] <dir>"),
@@ -1041,6 +1042,228 @@ static void upload_pack(void)
 	}
 }
 
+static int process_want(const char *line)
+{
+	const char *arg;
+	if (skip_prefix(line, "want ", &arg)) {
+		struct object_id oid;
+		struct object *o;
+
+		if (get_oid_hex(arg, &oid))
+			die("git upload-pack: protocol error, "
+			    "expected to get oid, not '%s'", line);
+
+		o = parse_object(&oid);
+		if (!o) {
+			packet_write_fmt(1,
+					 "ERR upload-pack: not our ref %s",
+					 oid_to_hex(&oid));
+			die("git upload-pack: not our ref %s",
+			    oid_to_hex(&oid));
+		}
+
+		if (!(o->flags & WANTED)) {
+			o->flags |= WANTED;
+			add_object_array(o, NULL, &want_obj);
+		}
+
+		return 1;
+	}
+
+	return 0;
+}
+
+static int parse_have(const char *line, struct oid_array *haves)
+{
+	const char *arg;
+	if (skip_prefix(line, "have ", &arg)) {
+		struct object_id oid;
+
+		if (get_oid_hex(arg, &oid))
+			die("git upload-pack: expected SHA1 object, got '%s'", arg);
+		oid_array_append(haves, &oid);
+		return 1;
+	}
+
+	return 0;
+}
+
+static void process_args(struct argv_array *args, struct oid_array *haves_oid)
+{
+	int i;
+
+	for (i = 0; i < args->argc; i++) {
+		const char *arg = args->argv[i];
+
+		/* process want */
+		if (process_want(arg))
+			continue;
+		/* process have line */
+		if (parse_have(arg, haves_oid))
+			continue;
+
+		/* process args like thin-pack */
+		if (!strcmp(arg, "thin-pack")) {
+			use_thin_pack = 1;
+			continue;
+		}
+		if (!strcmp(arg, "ofs-delta")) {
+			use_ofs_delta = 1;
+			continue;
+		}
+		if (!strcmp(arg, "no-progress")) {
+			no_progress = 1;
+			continue;
+		}
+		if (!strcmp(arg, "include-tag")) {
+			use_include_tag = 1;
+			continue;
+		}
+
+		/* ignore unknown lines maybe? */
+		die("unexpect line: '%s'", arg);
+	}
+}
+
+static int process_haves(struct oid_array *haves, struct oid_array *common)
+{
+	int i;
+
+	/* Process haves */
+	for (i = 0; i < haves->nr; i++) {
+		const struct object_id *oid = &haves->oid[i];
+		struct object *o;
+		int we_knew_they_have = 0;
+
+		if (!has_object_file(oid))
+			continue;
+
+		oid_array_append(common, oid);
+
+		o = parse_object(oid);
+		if (!o)
+			die("oops (%s)", oid_to_hex(oid));
+		if (o->type == OBJ_COMMIT) {
+			struct commit_list *parents;
+			struct commit *commit = (struct commit *)o;
+			if (o->flags & THEY_HAVE)
+				we_knew_they_have = 1;
+			else
+				o->flags |= THEY_HAVE;
+			if (!oldest_have || (commit->date < oldest_have))
+				oldest_have = commit->date;
+			for (parents = commit->parents;
+			     parents;
+			     parents = parents->next)
+				parents->item->object.flags |= THEY_HAVE;
+		}
+		if (!we_knew_they_have)
+			add_object_array(o, NULL, &have_obj);
+	}
+
+	return 0;
+}
+
+static int send_acks(struct oid_array *acks, struct strbuf *response)
+{
+	int i;
+	/* Send Acks */
+	if (!acks->nr)
+		packet_buf_write(response, "NAK\n");
+
+	for (i = 0; i < acks->nr; i++) {
+		packet_buf_write(response, "ACK %s common\n",
+				 oid_to_hex(&acks->oid[i]));
+	}
+
+	if (ok_to_give_up()) {
+		/* Send Ready */
+		packet_buf_write(response, "ACK %s ready\n",
+				 oid_to_hex(&acks->oid[i-1]));
+		return 1;
+	}
+
+	return 0;
+}
+
+#define COMMON_START 0
+#define COMMON_DONE 1
+#define COMMON_READ_HAVES 2
+#define COMMON_PROCESS_HAVES 3
+
+static int get_common_commits_v2(struct oid_array *haves_oid)
+{
+	struct packet_reader reader;
+	int done = 0;
+	int state = haves_oid->nr ? COMMON_PROCESS_HAVES : COMMON_DONE;
+	struct oid_array common = OID_ARRAY_INIT;
+	struct strbuf response = STRBUF_INIT;
+	packet_reader_init(&reader, 0, NULL, 0);
+
+	while (state != COMMON_DONE) {
+		switch (state) {
+		case COMMON_START:
+			break;
+		case COMMON_READ_HAVES:
+			while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
+
+				if (parse_have(reader.line, haves_oid))
+					continue;
+				if (!strcmp(reader.line, "done")) {
+					done = 1;
+					continue;
+				}
+			}
+			if (reader.status != PACKET_READ_FLUSH)
+				die("ERROR");
+
+			state = COMMON_PROCESS_HAVES;
+			break;
+		case COMMON_PROCESS_HAVES:
+			process_haves(haves_oid, &common);
+			if (done) {
+				state = COMMON_DONE;
+			} else if (send_acks(&common, &response)) {
+				packet_buf_delim(&response);
+				state = COMMON_DONE;
+			} else {
+				/* Add Flush */
+				packet_buf_flush(&response);
+				state = COMMON_READ_HAVES;
+			}
+
+			/* Send response */
+			write_or_die(1, response.buf, response.len);
+			strbuf_reset(&response);
+			oid_array_clear(haves_oid);
+			oid_array_clear(&common);
+			break;
+		case COMMON_DONE:
+			break;
+		default:
+			BUG("invalid state");
+		}
+	}
+	return 0;
+}
+
+int upload_pack_v2(struct repository *r, struct argv_array *keys,
+		   struct argv_array *args)
+{
+	struct oid_array haves_oid = OID_ARRAY_INIT;
+	use_sideband = LARGE_PACKET_MAX;
+	process_args(args, &haves_oid);
+
+	if (want_obj.nr) {
+		/* Determine Common Commits */
+		get_common_commits_v2(&haves_oid);
+		/* create packfile */
+		create_pack_file();
+	}
+
+	return 0;
+}
+
 static int upload_pack_config(const char *var, const char *value, void *unused)
 {
 	if (!strcmp("uploadpack.allowtipsha1inwant", var)) {
diff --git a/upload-pack.h b/upload-pack.h
new file mode 100644
index 000000000..54c429563
--- /dev/null
+++ b/upload-pack.h
@@ -0,0 +1,9 @@
+#ifndef UPLOAD_PACK_H
+#define UPLOAD_PACK_H
+
+struct repository;
+struct argv_array;
+extern int upload_pack_v2(struct repository *r, struct argv_array *keys,
+			  struct argv_array *args);
+
+#endif /* UPLOAD_PACK_H */
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [WIP 15/15] fetch-pack: perform a fetch using v2
  2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
                     ` (13 preceding siblings ...)
  2017-12-04 23:58   ` [WIP 14/15] upload_pack: introduce fetch server command Brandon Williams
@ 2017-12-04 23:58   ` Brandon Williams
  14 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-04 23:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

When communicating with a v2 server, perform a fetch by requesting the
'fetch' command.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/fetch-pack.c   |   2 +-
 fetch-pack.c           | 237 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fetch-pack.h           |   4 +-
 t/t5701-protocol-v2.sh |  26 ++++++
 transport.c            |   8 +-
 5 files changed, 270 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 061c278b4..e02c7761b 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -211,7 +211,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	}
 
 	ref = fetch_pack(&args, fd, conn, ref, dest, sought, nr_sought,
-			 &shallow, pack_lockfile_ptr);
+			 &shallow, pack_lockfile_ptr, protocol_v0);
 	if (pack_lockfile) {
 		printf("lock %s\n", pack_lockfile);
 		fflush(stdout);
diff --git a/fetch-pack.c b/fetch-pack.c
index 105506e9a..a8ef8f3e5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1007,6 +1007,232 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	return ref;
 }
 
+static void add_wants(const struct ref *wants, struct strbuf *req_buf)
+{
+	for ( ; wants ; wants = wants->next) {
+		const struct object_id *remote = &wants->old_oid;
+		const char *remote_hex;
+		struct object *o;
+
+		/*
+		 * If that object is complete (i.e. it is an ancestor of a
+		 * local ref), we tell them we have it but do not have to
+		 * tell them about its ancestors, which they already know
+		 * about.
+		 *
+		 * We use lookup_object here because we are only
+		 * interested in the case we *know* the object is
+		 * reachable and we have already scanned it.
+		 */
+		if (((o = lookup_object(remote->hash)) != NULL) &&
+		    (o->flags & COMPLETE)) {
+			continue;
+		}
+
+		remote_hex = oid_to_hex(remote);
+		packet_buf_write(req_buf, "want %s\n", remote_hex);
+	}
+}
+
+/* */
+static int add_haves(struct strbuf *req_buf)
+{
+	int count = 0;
+	const struct object_id *oid;
+	while ((oid = get_rev())) {
+		packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
+		if (++count >= INITIAL_FLUSH)
+			break;
+	};
+
+	return count;
+}
+
+static int send_fetch_request(int fd_out, const struct fetch_pack_args *args,
+			      const struct ref *wants)
+{
+	int ret = 0;
+	struct strbuf req_buf = STRBUF_INIT;
+
+	use_sideband = 2;
+	packet_buf_write(&req_buf, "command=fetch");
+	packet_buf_write(&req_buf, "agent=%s", git_user_agent_sanitized());
+	packet_buf_delim(&req_buf);
+	if (args->use_thin_pack)
+		packet_buf_write(&req_buf, "thin-pack");
+	if (args->no_progress)
+		packet_buf_write(&req_buf, "no-progress");
+	if (args->include_tag)
+		packet_buf_write(&req_buf, "include-tag");
+	if (prefer_ofs_delta)
+		packet_buf_write(&req_buf, "ofs-delta");
+
+	/* add wants */
+	add_wants(wants, &req_buf);
+
+	/* Add initial haves */
+	ret = add_haves(&req_buf);
+
+	/* Send request */
+	packet_buf_flush(&req_buf);
+	write_or_die(fd_out, req_buf.buf, req_buf.len);
+
+	strbuf_release(&req_buf);
+	return ret;
+}
+
+static enum ack_type process_ack(const char *line, struct object_id *oid)
+{
+	const char *arg;
+
+	if (!strcmp(line, "NAK"))
+		return NAK;
+	if (skip_prefix(line, "ACK ", &arg)) {
+		if (!parse_oid_hex(arg, oid, &arg)) {
+			if (strstr(arg, "continue"))
+				return ACK_continue;
+			if (strstr(arg, "common"))
+				return ACK_common;
+			if (strstr(arg, "ready"))
+				return ACK_ready;
+			return ACK;
+		}
+	}
+	if (skip_prefix(line, "ERR ", &arg))
+		die(_("remote error: %s"), arg);
+	die(_("git fetch-pack: expected ACK/NAK, got '%s'"), line);
+}
+
+static int process_acks(struct packet_reader *reader)
+{
+	int got_ready = 0;
+	int got_common = 0;
+	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+		struct object_id oid;
+		struct commit *commit;
+		enum ack_type ack = process_ack(reader->line, &oid);
+
+		switch (ack) {
+		case ACK_ready:
+			clear_prio_queue(&rev_list);
+			got_ready = 1;
+			/* fallthrough */
+		case ACK_common:
+			commit = lookup_commit(&oid);
+			mark_common(commit, 0, 1);
+			got_common = 1;
+			break;
+		case NAK:
+			break;
+		case ACK:
+		case ACK_continue:
+			die("ACK/ACK_continue not supported");
+		}
+	}
+
+	if (reader->status != PACKET_READ_FLUSH &&
+	    reader->status != PACKET_READ_DELIM)
+		die("Error during processing acks: %d",reader->status);
+
+	/* return 0 if no common, 1 if there are common, or 2 if ready */
+	return got_ready + got_common;
+}
+
+static int send_haves(int fd_out, int *in_vain)
+{
+	int ret = 0;
+	struct strbuf req_buf = STRBUF_INIT;
+	int haves_added = add_haves(&req_buf);
+
+	*in_vain += haves_added;
+	if (!haves_added || *in_vain >= MAX_IN_VAIN) {
+		/* Send Done */
+		packet_buf_write(&req_buf, "done\n");
+		ret = 1;
+	}
+
+	/* Send request */
+	packet_buf_flush(&req_buf);
+	write_or_die(fd_out, req_buf.buf, req_buf.len);
+
+	strbuf_release(&req_buf);
+	return ret;
+}
+
+#define FETCH_PACK_CHECK_LOCAL 0
+#define FETCH_PACK_REQUEST 1
+#define FETCH_PACK_ACKS 2
+#define FETCH_PACK_HAVES 3
+#define FETCH_PACK_GET_PACK 4
+#define FETCH_PACK_DONE 5
+
+static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
+				    int fd[2],
+				    const struct ref *orig_ref,
+				    struct ref **sought, int nr_sought,
+				    char **pack_lockfile)
+{
+	struct ref *ref = copy_ref_list(orig_ref);
+	int state = FETCH_PACK_CHECK_LOCAL;
+	struct packet_reader reader;
+	int in_vain = 0;
+	packet_reader_init(&reader, fd[0], NULL, 0);
+
+	while (state != FETCH_PACK_DONE) {
+		switch (state) {
+		case FETCH_PACK_CHECK_LOCAL:
+			sort_ref_list(&ref, ref_compare_name);
+			QSORT(sought, nr_sought, cmp_ref_by_name);
+
+			/* Filter 'ref' by 'sought' and those that aren't local */
+			if (everything_local(args, &ref, sought, nr_sought))
+				state = FETCH_PACK_DONE;
+			else
+				state = FETCH_PACK_REQUEST;
+			break;
+		case FETCH_PACK_REQUEST:
+			if (send_fetch_request(fd[1], args, ref))
+				state = FETCH_PACK_ACKS;
+			else
+				state = FETCH_PACK_GET_PACK;
+			break;
+		case FETCH_PACK_ACKS:
+			/* Process ACKs/NAKs */
+			switch (process_acks(&reader)) {
+			case 2:
+				state = FETCH_PACK_GET_PACK;
+				break;
+			case 1:
+				in_vain = 0;
+				/* fallthrough */
+			default:
+				state = FETCH_PACK_HAVES;
+				break;
+			}
+			break;
+		case FETCH_PACK_HAVES:
+			if (send_haves(fd[1], &in_vain))
+				state = FETCH_PACK_GET_PACK;
+			else
+				state = FETCH_PACK_ACKS;
+			break;
+		case FETCH_PACK_GET_PACK:
+			/* get the pack */
+			if (get_pack(args, fd, pack_lockfile))
+				die(_("git fetch-pack: fetch failed."));
+
+			state = FETCH_PACK_DONE;
+			break;
+		case FETCH_PACK_DONE:
+			break;
+		default:
+			die("invalid state");
+		}
+	}
+
+	return ref;
+}
+
 static void fetch_pack_config(void)
 {
 	git_config_get_int("fetch.unpacklimit", &fetch_unpack_limit);
@@ -1152,7 +1378,8 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		       const char *dest,
 		       struct ref **sought, int nr_sought,
 		       struct oid_array *shallow,
-		       char **pack_lockfile)
+		       char **pack_lockfile,
+		       enum protocol_version version)
 {
 	struct ref *ref_cpy;
 	struct shallow_info si;
@@ -1166,8 +1393,12 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		die(_("no matching remote head"));
 	}
 	prepare_shallow_info(&si, shallow);
-	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought,
-				&si, pack_lockfile);
+	if (version == protocol_v2)
+		ref_cpy = do_fetch_pack_v2(args, fd, ref, sought, nr_sought,
+					   pack_lockfile);
+	else
+		ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought,
+					&si, pack_lockfile);
 	reprepare_packed_git();
 	update_shallow(args, sought, nr_sought, &si);
 	clear_shallow_info(&si);
diff --git a/fetch-pack.h b/fetch-pack.h
index b6aeb43a8..7afca7305 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "protocol.h"
 
 struct oid_array;
 
@@ -43,7 +44,8 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		       struct ref **sought,
 		       int nr_sought,
 		       struct oid_array *shallow,
-		       char **pack_lockfile);
+		       char **pack_lockfile,
+		       enum protocol_version version);
 
 /*
  * Print an appropriate error message for each sought ref that wasn't
diff --git a/t/t5701-protocol-v2.sh b/t/t5701-protocol-v2.sh
index 4bf4d61ac..c788ea597 100755
--- a/t/t5701-protocol-v2.sh
+++ b/t/t5701-protocol-v2.sh
@@ -25,4 +25,30 @@ test_expect_success 'list refs with file:// using protocol v2' '
 	test_cmp actual expect
 '
 
+test_expect_success 'clone with file:// using protocol v2' '
+	GIT_TRACE_PACKET=1 git -c protocol.version=2 \
+		clone "file://$(pwd)/file_parent" file_child 2>log &&
+
+	git -C file_child log -1 --format=%s >actual &&
+	git -C file_parent log -1 --format=%s >expect &&
+	test_cmp expect actual &&
+
+	# Server responded using protocol v1
+	grep "clone< version 2" log
+'
+
+test_expect_success 'fetch with file:// using protocol v2' '
+	test_commit -C file_parent two &&
+
+	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=2 \
+		fetch origin 2>log &&
+
+	git -C file_child log -1 --format=%s origin/master >actual &&
+	git -C file_parent log -1 --format=%s >expect &&
+	test_cmp expect actual &&
+
+	# Server responded using protocol v1
+	grep "fetch< version 2" log
+'
+
 test_done
diff --git a/transport.c b/transport.c
index ddd0d6174..50354abed 100644
--- a/transport.c
+++ b/transport.c
@@ -252,14 +252,18 @@ static int fetch_refs_via_pack(struct transport *transport,
 
 	switch (data->version) {
 	case protocol_v2:
-		die("support for protocol v2 not implemented yet");
+		refs = fetch_pack(&args, data->fd, data->conn,
+				  refs_tmp ? refs_tmp : transport->remote_refs,
+				  dest, to_fetch, nr_heads, &data->shallow,
+				  &transport->pack_lockfile, data->version);
+		packet_flush(data->fd[1]);
 		break;
 	case protocol_v1:
 	case protocol_v0:
 		refs = fetch_pack(&args, data->fd, data->conn,
 				  refs_tmp ? refs_tmp : transport->remote_refs,
 				  dest, to_fetch, nr_heads, &data->shallow,
-				  &transport->pack_lockfile);
+				  &transport->pack_lockfile, data->version);
 		break;
 	case protocol_unknown_version:
 		BUG("unknown protocol version");
-- 
2.15.1.424.g9478a66081-goog


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [WIP 04/15] upload-pack: convert to a builtin
  2017-12-04 23:58   ` [WIP 04/15] upload-pack: convert to a builtin Brandon Williams
@ 2017-12-06 21:59     ` Junio C Hamano
  2017-12-07 16:14       ` Johannes Schindelin
  2017-12-08 20:12       ` Brandon Williams
  0 siblings, 2 replies; 47+ messages in thread
From: Junio C Hamano @ 2017-12-06 21:59 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> In order to allow for code sharing with the server-side of fetch in
> protocol-v2 convert upload-pack to be a builtin.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---

This looks obvious and straight-forward to a cursory look.

I vaguely recalled and feared that we on purpose kept this program
separate from the rest of the system for a reason, but my mailing
list search is coming up empty.

>  Makefile      | 3 ++-
>  builtin.h     | 1 +
>  git.c         | 1 +
>  upload-pack.c | 2 +-
>  4 files changed, 5 insertions(+), 2 deletions(-)
> ...
> diff --git a/upload-pack.c b/upload-pack.c
> index ef99a029c..2d16952a3 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1033,7 +1033,7 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
>  	return parse_hide_refs_config(var, value, "uploadpack");
>  }
>  
> -int cmd_main(int argc, const char **argv)
> +int cmd_upload_pack(int argc, const char **argv, const char *prefix)
>  {
>  	const char *dir;
>  	int strict = 0;

Shouldn't this file be moved to builtin/ directory, though?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 06/15] transport: use get_refs_via_connect to get refs
  2017-12-04 23:58   ` [WIP 06/15] transport: use get_refs_via_connect to get refs Brandon Williams
@ 2017-12-06 22:10     ` Junio C Hamano
  2017-12-07 18:40       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-12-06 22:10 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> Remove code duplication and use the existing 'get_refs_via_connect()'
> function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> 'git_transport_push()'.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  transport.c | 18 ++++--------------
>  1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/transport.c b/transport.c
> index d75ff0514..7c969f285 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport *transport,
>  	args.cloning = transport->cloning;
>  	args.update_shallow = data->options.update_shallow;
>  
> -	if (!data->got_remote_heads) {
> -		connect_setup(transport, 0);
> -		get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
> -				 NULL, &data->shallow);
> -		data->got_remote_heads = 1;
> -	}
> +	if (!data->got_remote_heads)
> +		refs_tmp = get_refs_via_connect(transport, 0);

The updated version is equivalent to the original as long as
transport->data->extra_have is empty at this point.  Were we
deliberately sending NULL, instead of &data->extra_have, in the
original, or is it a mere oversight?

The same comment applies to the other hunk of this patch.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader
  2017-12-04 23:58   ` [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader Brandon Williams
@ 2017-12-06 22:39     ` Junio C Hamano
  2017-12-08 20:19       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-12-06 22:39 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> @@ -56,6 +62,41 @@ static void die_initial_contact(int unexpected)
>  		      "and the repository exists."));
>  }
>  
> +static enum protocol_version discover_version(struct packet_reader *reader)
> +{
> +	enum protocol_version version = protocol_unknown_version;
> +
> +	/*
> +	 * Peek the first line of the server's response to
> +	 * determine the protocol version the server is speaking.
> +	 */
> +	switch (packet_reader_peek(reader)) {
> +	case PACKET_READ_ERROR:
> +		die_initial_contact(0);
> +	case PACKET_READ_FLUSH:
> +	case PACKET_READ_DELIM:
> +		version = protocol_v0;
> +		break;
> +	case PACKET_READ_NORMAL:
> +		version = determine_protocol_version_client(reader->line);
> +		break;
> +	}
> +
> +	/* Maybe process capabilities here, at least for v2 */
> +	switch (version) {
> +	case protocol_v1:
> +		/* Read the peeked version line */
> +		packet_reader_read(reader);
> +		break;
> +	case protocol_v0:
> +		break;
> +	case protocol_unknown_version:
> +		BUG("ERROR");
> +	}
> +
> +	return version;
> +}
> +

This discovers and consumes the "version" thing, but for an older
protocol that does not have the "version" thing, it does not clobber
the first "ref", thanks to its use of peek.  Makes sense.

> +#define EXPECTING_FIRST_REF 0
> +#define EXPECTING_REF 1
> +#define EXPECTING_SHALLOW 2
> +#define EXPECTING_DONE 3
>  
>  static void process_capabilities(int *len)
>  {
> @@ -230,28 +237,34 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
>  			      struct oid_array *shallow_points)
>  {
>  	struct ref **orig_list = list;
> +	int len = 0;
> +	int state = EXPECTING_FIRST_REF;
> +	struct packet_reader reader;
> +	const char *arg;
>  
> -	/*
> -	 * A hang-up after seeing some response from the other end
> -	 * means that it is unexpected, as we know the other end is
> -	 * willing to talk to us.  A hang-up before seeing any
> -	 * response does not necessarily mean an ACL problem, though.
> -	 */
> -	int responded = 0;
> -	int len;
> -	int state = EXPECTING_PROTOCOL_VERSION;
> +	packet_reader_init(&reader, in, src_buf, src_len);
> +
> +	discover_version(&reader);
>  
>  	*list = NULL;

And thanks to the "peeking" implementation, the version handling is
cleanly separated from the rest of the exchange, which is good.

> -	while ((len = read_remote_ref(in, &src_buf, &src_len, &responded))) {
> +	while (state != EXPECTING_DONE) {
> +		switch (packet_reader_read(&reader)) {
> +		case PACKET_READ_ERROR:
> +			die_initial_contact(1);
> +		case PACKET_READ_NORMAL:
> +			len = reader.pktlen;
> +			if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
> +				die("remote error: %s", arg);
> +			break;
> +		case PACKET_READ_FLUSH:
> +			state = EXPECTING_DONE;
> +			break;
> +		case PACKET_READ_DELIM:
> +			die("invalid packet\n");
> +		}
> +

EXPECTING_DONE sounded like we are expecting to see 'done' packet
sent from the other side, but I was mistaken.  It is the state
where we are "done" expecting anything ;-).

Having an (unconditional) assignment to 'state' in the above switch
makes me feel somewhat uneasy, as the next "switch (state)" is what
is meant as the state machine that would allow us to say things like
"from this state, transition to that state is impossible".  When we
get a flush while we are expecting the first ref, for example, we'd
just go into the "done" state.  There is no provision for a future
update to say "no, getting a flush in this state is an error".

That is no different from the current code; when read_remote_ref()
notices that it got a flush, it just leaves the loop without even
touching 'state' variable.  But at least, I find that the current
code is more honest---it does not even touch 'state' and allows the
code after the loop to inspect it, if needed.  From that point of
vhew, the way the new code uses 'state' to leave the loop upon
seeing a flush is a regression---it makes it harder to notice and
report when we got a flush in a wrong state.

Perhaps getting rid of "EXPECTING_DONE" from the enum and then:

	int got_flush = 0;
	while (1) {
		switch (reader_read()) {
		case PACKET_READ_FLUSH:
			got_flush = 1;
			break;
		... other cases ...
		}

		if (got_flush)
			break;

		switch (state) {
		... current code ...
		}
	}

would be an improvement; we can later extend "if (got_flush)" part
to check what state we are in if we wanted to notice and report an
error there before breaking out of the loop.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 04/15] upload-pack: convert to a builtin
  2017-12-06 21:59     ` Junio C Hamano
@ 2017-12-07 16:14       ` Johannes Schindelin
  2017-12-08 20:26         ` Junio C Hamano
  2017-12-08 20:12       ` Brandon Williams
  1 sibling, 1 reply; 47+ messages in thread
From: Johannes Schindelin @ 2017-12-07 16:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git

Hi Junio,

On Wed, 6 Dec 2017, Junio C Hamano wrote:

> Brandon Williams <bmwill@google.com> writes:
> 
> > In order to allow for code sharing with the server-side of fetch in
> > protocol-v2 convert upload-pack to be a builtin.
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> > ---
> 
> This looks obvious and straight-forward to a cursory look.
> 
> I vaguely recalled and feared that we on purpose kept this program
> separate from the rest of the system for a reason, but my mailing
> list search is coming up empty.

I only recall that we kept it in the bin/ directory (as opposed to
mlibexec/git-core/) to help with fetching via SSH.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 06/15] transport: use get_refs_via_connect to get refs
  2017-12-06 22:10     ` Junio C Hamano
@ 2017-12-07 18:40       ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-07 18:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/06, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > Remove code duplication and use the existing 'get_refs_via_connect()'
> > function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> > 'git_transport_push()'.
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> > ---
> >  transport.c | 18 ++++--------------
> >  1 file changed, 4 insertions(+), 14 deletions(-)
> >
> > diff --git a/transport.c b/transport.c
> > index d75ff0514..7c969f285 100644
> > --- a/transport.c
> > +++ b/transport.c
> > @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport *transport,
> >  	args.cloning = transport->cloning;
> >  	args.update_shallow = data->options.update_shallow;
> >  
> > -	if (!data->got_remote_heads) {
> > -		connect_setup(transport, 0);
> > -		get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
> > -				 NULL, &data->shallow);
> > -		data->got_remote_heads = 1;
> > -	}
> > +	if (!data->got_remote_heads)
> > +		refs_tmp = get_refs_via_connect(transport, 0);
> 
> The updated version is equivalent to the original as long as
> transport->data->extra_have is empty at this point.  Were we
> deliberately sending NULL, instead of &data->extra_have, in the
> original, or is it a mere oversight?
> 
> The same comment applies to the other hunk of this patch.

extra_have is only ever used by the push logic, so they're shouldn't be
any harm is passing it through on the fetch side, especially since
upload-pack doesn't send .have lines.

The push side is what uses the .have lines.  From a quick look through
the code it seems like get_refs_via_connect is always called before
git_transport_push, so the extra check to make sure ref's have been
retrieved doesn't get executed.  But if it ever did get executed, we
would silently ignore a server's .have lines.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads
  2017-12-04 23:58   ` [WIP 08/15] connect: discover protocol version outside of get_remote_heads Brandon Williams
@ 2017-12-07 18:50     ` Junio C Hamano
  2017-12-07 19:04       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-12-07 18:50 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> @@ -193,7 +195,17 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
>  		if (!conn)
>  			return args.diag_url ? 0 : 1;
>  	}
> -	get_remote_heads(fd[0], NULL, 0, &ref, 0, NULL, &shallow);
> +
> +	packet_reader_init(&reader, fd[0], NULL, 0);
> +
> +	switch (discover_version(&reader)) {
> +	case protocol_v1:
> +	case protocol_v0:
> +		get_remote_heads(&reader, &ref, 0, NULL, &shallow);
> +		break;
> +	case protocol_unknown_version:
> +		BUG("unknown protocol version");
> +	}

We see quite a few hunks just like this one appear in this patch.
The repetition is a bit disturbing.  I wonder if we want a wrapper
around the "reader-init && discover-version && return an error if
the protocol version is not known" dance.  That would allow us to
write this part of the code like so:

	struct packet_reader reader;

	if (connection_preamble(&reader, fd[0]))
		die("unknown protocol version");
	get_remote_heads(&reader, &ref, 0, NULL, &shallow);

or something like that.

By the way, is that really a BUG()?  Getting a connection and the
other end declaring a protocol version you do not yet understand is
not a bug in your program---it is a normal runtime error, no?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads
  2017-12-07 18:50     ` Junio C Hamano
@ 2017-12-07 19:04       ` Brandon Williams
  2017-12-07 19:30         ` Junio C Hamano
  0 siblings, 1 reply; 47+ messages in thread
From: Brandon Williams @ 2017-12-07 19:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/07, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > @@ -193,7 +195,17 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
> >  		if (!conn)
> >  			return args.diag_url ? 0 : 1;
> >  	}
> > -	get_remote_heads(fd[0], NULL, 0, &ref, 0, NULL, &shallow);
> > +
> > +	packet_reader_init(&reader, fd[0], NULL, 0);
> > +
> > +	switch (discover_version(&reader)) {
> > +	case protocol_v1:
> > +	case protocol_v0:
> > +		get_remote_heads(&reader, &ref, 0, NULL, &shallow);
> > +		break;
> > +	case protocol_unknown_version:
> > +		BUG("unknown protocol version");
> > +	}
> 
> We see quite a few hunks just like this one appear in this patch.
> The repetition is a bit disturbing.  I wonder if we want a wrapper
> around the "reader-init && discover-version && return an error if
> the protocol version is not known" dance.  That would allow us to
> write this part of the code like so:
> 
> 	struct packet_reader reader;
> 
> 	if (connection_preamble(&reader, fd[0]))
> 		die("unknown protocol version");
> 	get_remote_heads(&reader, &ref, 0, NULL, &shallow);
> 
> or something like that.
> 
> By the way, is that really a BUG()?  Getting a connection and the
> other end declaring a protocol version you do not yet understand is
> not a bug in your program---it is a normal runtime error, no?

While we could wrap the preamble into a function it sort of defeats the
purpose since you want to be able to call different functions based on
the protocol version you're speaking.  That way you can have hard
separations between the code paths which operate on v0/v1 and v2.

As for that case being a BUG, yes it should be a BUG.  the
discover_version function won't ever return a protocol_unknown_version
value.  Its only there to make sure the switch cases are exhaustive and
cover all the enum values.  That does bring up a good point though.
This error should be handled as a run-time error and not a BUG in
discover_version, which I'll fix.


-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads
  2017-12-07 19:04       ` Brandon Williams
@ 2017-12-07 19:30         ` Junio C Hamano
  2017-12-08 20:11           ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-12-07 19:30 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> While we could wrap the preamble into a function it sort of defeats the
> purpose since you want to be able to call different functions based on
> the protocol version you're speaking.  That way you can have hard
> separations between the code paths which operate on v0/v1 and v2.

As I envision that the need for different processing across protocol
versions in real code would become far greater than it would fit in
cases within a single switch() statement, I imagined that the reader
structure would have a field that says which version of the protocol
it is, so that the caller of the preamble thing can inspect it later
to switch on it.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 01/15] pkt-line: introduce packet_read_with_status
  2017-12-04 23:58   ` [WIP 01/15] pkt-line: introduce packet_read_with_status Brandon Williams
@ 2017-12-07 20:53     ` Stefan Beller
  2017-12-08 18:03       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Stefan Beller @ 2017-12-07 20:53 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:

> diff --git a/pkt-line.h b/pkt-line.h
> index 3dad583e2..f1545929b 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -60,8 +60,16 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
>   * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if
>   * present) is removed from the buffer before returning.
>   */
> +enum packet_read_status {
> +       PACKET_READ_ERROR = -1,
> +       PACKET_READ_NORMAL,
> +       PACKET_READ_FLUSH,
> +};
>  #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
>  #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
> +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len,
> +                                               char *buffer, unsigned size, int *pktlen,
> +                                               int options);
>  int packet_read(int fd, char **src_buffer, size_t *src_len, char
>                 *buffer, unsigned size, int options);

The documentation that is preceding these lines is very specific to
packet_read, e.g.

    If options does contain PACKET_READ_GENTLE_ON_EOF,
    we will not die on condition 4 (truncated input), but instead return -1

which doesn't hold true for the _status version. Can you adapt the comment
to explain both read functions?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 02/15] pkt-line: introduce struct packet_reader
  2017-12-04 23:58   ` [WIP 02/15] pkt-line: introduce struct packet_reader Brandon Williams
@ 2017-12-07 22:01     ` Stefan Beller
  2017-12-08 18:11       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Stefan Beller @ 2017-12-07 22:01 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:
> Sometimes it is advantageous to be able to peek the next packet line
> without consuming it (e.g. to be able to determine the protocol version
> a server is speaking).  In order to do that introduce 'struct
> packet_reader' which is an abstraction around the normal packet reading
> logic.  This enables a caller to be able to peek a single line at a time
> using 'packet_reader_peek()' and having a caller consume a line by
> calling 'packet_reader_read()'.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  pkt-line.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  pkt-line.h | 20 ++++++++++++++++++++
>  2 files changed, 81 insertions(+)
>
> diff --git a/pkt-line.c b/pkt-line.c
> index ac619f05b..518109bbe 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -406,3 +406,64 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out)
>         }
>         return sb_out->len - orig_len;
>  }
> +
> +/* Packet Reader Functions */
> +void packet_reader_init(struct packet_reader *reader, int fd,
> +                       char *src_buffer, size_t src_len)
> +{
> +       reader->fd = fd;
> +       reader->src_buffer = src_buffer;
> +       reader->src_len = src_len;
> +       reader->buffer = packet_buffer;
> +       reader->buffer_size = sizeof(packet_buffer);
> +       reader->options = PACKET_READ_CHOMP_NEWLINE | PACKET_READ_GENTLE_ON_EOF;

I assume the future user of this packet reader will need exactly
these options coincidentally. ;)

I think it might be ok for now and later we can extend the reader if needed
to also take the flags as args. However given this set of args, this is a gentle
packet reader, as it corresponds to the _gently version of reading
packets AFAICT.

Unlike the pkt_read function this constructor of a packet reader doesn't need
the arguments for its buffer (packet_buffer and sizeof thereof), which
packet_read
unfortunately needs. We pass in packet_buffer all the time except in
builtin/receive-pack
for obtaining the gpg cert. I think that's ok here.

> +enum packet_read_status packet_reader_read(struct packet_reader *reader)
> +{
> +       if (reader->line_peeked) {
> +               reader->line_peeked = 0;
> +               return reader->status;
> +       }
> +
> +       reader->status = packet_read_with_status(reader->fd,
> +                                                &reader->src_buffer,
> +                                                &reader->src_len,
> +                                                reader->buffer,
> +                                                reader->buffer_size,
> +                                                &reader->pktlen,
> +                                                reader->options);
> +
> +       switch (reader->status) {
> +       case PACKET_READ_ERROR:
> +               reader->pktlen = -1;

In case of error the read function might not
have assigned the pktlen as requested, so we assign
it to -1/NULL here. Though the caller ought to already know
that they handle bogus, as the state is surely the first thing
they'd inspect?

> +               reader->line = NULL;
> +               break;
> +       case PACKET_READ_NORMAL:
> +               reader->line = reader->buffer;
> +               break;
> +       case PACKET_READ_FLUSH:
> +               reader->pktlen = 0;
> +               reader->line = NULL;
> +               break;
> +       }

Oh, this gives an interesting interface for someone who is
just inspecting the len/line instead of the state, so it might be
worth keeping it this way.

Can we have an API documentation in the header file,
explaining what to expect in each field given the state
of the (read, peaked) packet?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 03/15] pkt-line: add delim packet support
  2017-12-04 23:58   ` [WIP 03/15] pkt-line: add delim packet support Brandon Williams
@ 2017-12-07 22:30     ` Stefan Beller
  2017-12-08 20:08       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Stefan Beller @ 2017-12-07 22:30 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:
> One of the design goals of protocol-v2 is to improve the semantics of
> flush packets.  Currently in protocol-v1, flush packets are used both to
> indicate a break in a list of packet lines as well as an indication that
> one side has finished speaking.  This makes it particularly difficult
> to implement proxies as a proxy would need to completely understand git
> protocol instead of simply looking for a flush packet.
>
> To do this, introduce the special deliminator packet '0001'.  A delim
> packet can then be used as a deliminator between lists of packet lines
> while flush packets can be reserved to indicate the end of a response.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>

I presume the update for Documentation/technical/* comes at a later patch in the
series, clarifying the exact semantic difference between the packet types?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 11/15] serve: introduce git-serve
  2017-12-04 23:58   ` [WIP 11/15] serve: introduce git-serve Brandon Williams
@ 2017-12-07 23:42     ` Junio C Hamano
  2017-12-08 20:25       ` Brandon Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Junio C Hamano @ 2017-12-07 23:42 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> +static struct protocol_capability *get_capability(const char *key)
> +{
> +	int i;
> +
> +	if (!key)
> +		return NULL;
> +
> +	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
> +		struct protocol_capability *c = &capabilities[i];
> +		const char *out;
> +		if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
> +			return c;

Looks familiar and resembles what was recently discussed on list ;-)

> +int cmd_serve(int argc, const char **argv, const char *prefix)
> +{
> +
> +	struct option options[] = {
> +		OPT_END()
> +	};
> +
> +	/* ignore all unknown cmdline switches for now */
> +	argc = parse_options(argc, argv, prefix, options, grep_usage,
> +			     PARSE_OPT_KEEP_DASHDASH |
> +			     PARSE_OPT_KEEP_UNKNOWN);
> +	serve();
> +
> +	return 0;
> +}
> ...
> +/* Main serve loop for protocol version 2 */
> +void serve(void)
> +{
> +	/* serve by default supports v2 */
> +	packet_write_fmt(1, "version 2\n");
> +
> +	advertise_capabilities();
> +
> +	for (;;)
> +		if (process_request())
> +			break;
> +}

I am guessing that this would be run just like upload-pack,
i.e. invoked via ssh or via git-daemon, and that is why it can just
assume that fd#0/fd#1 are already connected to the other end.  It
may be helpful to document somewhere how we envision to invoke this
program.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 01/15] pkt-line: introduce packet_read_with_status
  2017-12-07 20:53     ` Stefan Beller
@ 2017-12-08 18:03       ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 18:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:
> 
> > diff --git a/pkt-line.h b/pkt-line.h
> > index 3dad583e2..f1545929b 100644
> > --- a/pkt-line.h
> > +++ b/pkt-line.h
> > @@ -60,8 +60,16 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
> >   * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if
> >   * present) is removed from the buffer before returning.
> >   */
> > +enum packet_read_status {
> > +       PACKET_READ_ERROR = -1,
> > +       PACKET_READ_NORMAL,
> > +       PACKET_READ_FLUSH,
> > +};
> >  #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
> >  #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
> > +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, size_t *src_len,
> > +                                               char *buffer, unsigned size, int *pktlen,
> > +                                               int options);
> >  int packet_read(int fd, char **src_buffer, size_t *src_len, char
> >                 *buffer, unsigned size, int options);
> 
> The documentation that is preceding these lines is very specific to
> packet_read, e.g.
> 
>     If options does contain PACKET_READ_GENTLE_ON_EOF,
>     we will not die on condition 4 (truncated input), but instead return -1
> 
> which doesn't hold true for the _status version. Can you adapt the comment
> to explain both read functions?

Good point, I'll makes changes and document the _status version
separately.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 02/15] pkt-line: introduce struct packet_reader
  2017-12-07 22:01     ` Stefan Beller
@ 2017-12-08 18:11       ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 18:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:
> > Sometimes it is advantageous to be able to peek the next packet line
> > without consuming it (e.g. to be able to determine the protocol version
> > a server is speaking).  In order to do that introduce 'struct
> > packet_reader' which is an abstraction around the normal packet reading
> > logic.  This enables a caller to be able to peek a single line at a time
> > using 'packet_reader_peek()' and having a caller consume a line by
> > calling 'packet_reader_read()'.
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> > ---
> >  pkt-line.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  pkt-line.h | 20 ++++++++++++++++++++
> >  2 files changed, 81 insertions(+)
> >
> > diff --git a/pkt-line.c b/pkt-line.c
> > index ac619f05b..518109bbe 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -406,3 +406,64 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out)
> >         }
> >         return sb_out->len - orig_len;
> >  }
> > +
> > +/* Packet Reader Functions */
> > +void packet_reader_init(struct packet_reader *reader, int fd,
> > +                       char *src_buffer, size_t src_len)
> > +{
> > +       reader->fd = fd;
> > +       reader->src_buffer = src_buffer;
> > +       reader->src_len = src_len;
> > +       reader->buffer = packet_buffer;
> > +       reader->buffer_size = sizeof(packet_buffer);
> > +       reader->options = PACKET_READ_CHOMP_NEWLINE | PACKET_READ_GENTLE_ON_EOF;
> 
> I assume the future user of this packet reader will need exactly
> these options coincidentally. ;)
> 
> I think it might be ok for now and later we can extend the reader if needed
> to also take the flags as args. However given this set of args, this is a gentle
> packet reader, as it corresponds to the _gently version of reading
> packets AFAICT.
> 
> Unlike the pkt_read function this constructor of a packet reader doesn't need
> the arguments for its buffer (packet_buffer and sizeof thereof), which
> packet_read
> unfortunately needs. We pass in packet_buffer all the time except in
> builtin/receive-pack
> for obtaining the gpg cert. I think that's ok here.

Yeah, all of the callers I wanted to migrate all passed the same flags
and same buffer.  I figured there may be a point in the future where
we'd want to extend this so instead of hard coding the flags in the read
functions, I did so in the constructor.  That way if we wanted to extend
it in the future, it would be a little bit easier.

> 
> > +enum packet_read_status packet_reader_read(struct packet_reader *reader)
> > +{
> > +       if (reader->line_peeked) {
> > +               reader->line_peeked = 0;
> > +               return reader->status;
> > +       }
> > +
> > +       reader->status = packet_read_with_status(reader->fd,
> > +                                                &reader->src_buffer,
> > +                                                &reader->src_len,
> > +                                                reader->buffer,
> > +                                                reader->buffer_size,
> > +                                                &reader->pktlen,
> > +                                                reader->options);
> > +
> > +       switch (reader->status) {
> > +       case PACKET_READ_ERROR:
> > +               reader->pktlen = -1;
> 
> In case of error the read function might not
> have assigned the pktlen as requested, so we assign
> it to -1/NULL here. Though the caller ought to already know
> that they handle bogus, as the state is surely the first thing
> they'd inspect?

Exactly, I thought it would be easier to ensure that pktlen and line
were had consistent state no matter what the output of the read was.
But yes, callers should be looking at the status to determine what to
do.

> 
> > +               reader->line = NULL;
> > +               break;
> > +       case PACKET_READ_NORMAL:
> > +               reader->line = reader->buffer;
> > +               break;
> > +       case PACKET_READ_FLUSH:
> > +               reader->pktlen = 0;
> > +               reader->line = NULL;
> > +               break;
> > +       }
> 
> Oh, this gives an interesting interface for someone who is
> just inspecting the len/line instead of the state, so it might be
> worth keeping it this way.
> 
> Can we have an API documentation in the header file,
> explaining what to expect in each field given the state
> of the (read, peaked) packet?

Yep I can write some better API docs for these functions.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 03/15] pkt-line: add delim packet support
  2017-12-07 22:30     ` Stefan Beller
@ 2017-12-08 20:08       ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 20:08 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams <bmwill@google.com> wrote:
> > One of the design goals of protocol-v2 is to improve the semantics of
> > flush packets.  Currently in protocol-v1, flush packets are used both to
> > indicate a break in a list of packet lines as well as an indication that
> > one side has finished speaking.  This makes it particularly difficult
> > to implement proxies as a proxy would need to completely understand git
> > protocol instead of simply looking for a flush packet.
> >
> > To do this, introduce the special deliminator packet '0001'.  A delim
> > packet can then be used as a deliminator between lists of packet lines
> > while flush packets can be reserved to indicate the end of a response.
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> 
> I presume the update for Documentation/technical/* comes at a later patch in the
> series, clarifying the exact semantic difference between the packet types?

Yeah, currently there isn't a use for the delim packet but there will be
one when v2 is introduced.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads
  2017-12-07 19:30         ` Junio C Hamano
@ 2017-12-08 20:11           ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 20:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/07, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > While we could wrap the preamble into a function it sort of defeats the
> > purpose since you want to be able to call different functions based on
> > the protocol version you're speaking.  That way you can have hard
> > separations between the code paths which operate on v0/v1 and v2.
> 
> As I envision that the need for different processing across protocol
> versions in real code would become far greater than it would fit in
> cases within a single switch() statement, I imagined that the reader
> structure would have a field that says which version of the protocol
> it is, so that the caller of the preamble thing can inspect it later
> to switch on it.
> 

Yes, patch 9 changes this so that the protocol version is stored in the
transport structure.  This way the fetch and push code can know what to
do in v2.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 04/15] upload-pack: convert to a builtin
  2017-12-06 21:59     ` Junio C Hamano
  2017-12-07 16:14       ` Johannes Schindelin
@ 2017-12-08 20:12       ` Brandon Williams
  1 sibling, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 20:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/06, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > In order to allow for code sharing with the server-side of fetch in
> > protocol-v2 convert upload-pack to be a builtin.
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> > ---
> 
> This looks obvious and straight-forward to a cursory look.
> 
> I vaguely recalled and feared that we on purpose kept this program
> separate from the rest of the system for a reason, but my mailing
> list search is coming up empty.
> 
> >  Makefile      | 3 ++-
> >  builtin.h     | 1 +
> >  git.c         | 1 +
> >  upload-pack.c | 2 +-
> >  4 files changed, 5 insertions(+), 2 deletions(-)
> > ...
> > diff --git a/upload-pack.c b/upload-pack.c
> > index ef99a029c..2d16952a3 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -1033,7 +1033,7 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
> >  	return parse_hide_refs_config(var, value, "uploadpack");
> >  }
> >  
> > -int cmd_main(int argc, const char **argv)
> > +int cmd_upload_pack(int argc, const char **argv, const char *prefix)
> >  {
> >  	const char *dir;
> >  	int strict = 0;
> 
> Shouldn't this file be moved to builtin/ directory, though?

I can definitely move the file to builtin if you would prefer.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader
  2017-12-06 22:39     ` Junio C Hamano
@ 2017-12-08 20:19       ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 20:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/06, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> 
> EXPECTING_DONE sounded like we are expecting to see 'done' packet
> sent from the other side, but I was mistaken.  It is the state
> where we are "done" expecting anything ;-).
> 
> Having an (unconditional) assignment to 'state' in the above switch
> makes me feel somewhat uneasy, as the next "switch (state)" is what
> is meant as the state machine that would allow us to say things like
> "from this state, transition to that state is impossible".  When we
> get a flush while we are expecting the first ref, for example, we'd
> just go into the "done" state.  There is no provision for a future
> update to say "no, getting a flush in this state is an error".

I believe this is accepted behavior, receiving a flush in that state.
And I don't think there is ever an instance during the ref advertisement
where a flush would be an error.  It just indicates that the
advertisement is finished.

> 
> That is no different from the current code; when read_remote_ref()
> notices that it got a flush, it just leaves the loop without even
> touching 'state' variable.  But at least, I find that the current
> code is more honest---it does not even touch 'state' and allows the
> code after the loop to inspect it, if needed.  From that point of
> vhew, the way the new code uses 'state' to leave the loop upon
> seeing a flush is a regression---it makes it harder to notice and
> report when we got a flush in a wrong state.
> 
> Perhaps getting rid of "EXPECTING_DONE" from the enum and then:
> 
> 	int got_flush = 0;
> 	while (1) {
> 		switch (reader_read()) {
> 		case PACKET_READ_FLUSH:
> 			got_flush = 1;
> 			break;
> 		... other cases ...
> 		}
> 
> 		if (got_flush)
> 			break;
> 
> 		switch (state) {
> 		... current code ...
> 		}
> 	}
> 
> would be an improvement; we can later extend "if (got_flush)" part
> to check what state we are in if we wanted to notice and report an
> error there before breaking out of the loop.
> 

I don't really see how this is any clearer from what this patch does
though.  I thought it made it easier to read as you no longer have an
infinite loop, but rather know when it will end (when you move to the
done state).

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 11/15] serve: introduce git-serve
  2017-12-07 23:42     ` Junio C Hamano
@ 2017-12-08 20:25       ` Brandon Williams
  0 siblings, 0 replies; 47+ messages in thread
From: Brandon Williams @ 2017-12-08 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 12/07, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > +static struct protocol_capability *get_capability(const char *key)
> > +{
> > +	int i;
> > +
> > +	if (!key)
> > +		return NULL;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
> > +		struct protocol_capability *c = &capabilities[i];
> > +		const char *out;
> > +		if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
> > +			return c;
> 
> Looks familiar and resembles what was recently discussed on list ;-)
> 
> > +int cmd_serve(int argc, const char **argv, const char *prefix)
> > +{
> > +
> > +	struct option options[] = {
> > +		OPT_END()
> > +	};
> > +
> > +	/* ignore all unknown cmdline switches for now */
> > +	argc = parse_options(argc, argv, prefix, options, grep_usage,
> > +			     PARSE_OPT_KEEP_DASHDASH |
> > +			     PARSE_OPT_KEEP_UNKNOWN);
> > +	serve();
> > +
> > +	return 0;
> > +}

I assume that at some point we may want to have a new endpoint that just
does v2 without needing the side channel to tell it to do so.  Maybe for
brand new server commands, like a remote grep or a remote object-stat or
something that don't have a v1 equivalent that can be fallen back to.
That's why I included a builtin/serve.c 

> > ...
> > +/* Main serve loop for protocol version 2 */
> > +void serve(void)
> > +{
> > +	/* serve by default supports v2 */
> > +	packet_write_fmt(1, "version 2\n");
> > +
> > +	advertise_capabilities();
> > +
> > +	for (;;)
> > +		if (process_request())
> > +			break;
> > +}
> 
> I am guessing that this would be run just like upload-pack,
> i.e. invoked via ssh or via git-daemon, and that is why it can just
> assume that fd#0/fd#1 are already connected to the other end.  It
> may be helpful to document somewhere how we envision to invoke this
> program.
> 

This function I was planning to just be executed by upload-pack and
receive-pack when a client requests protocol v2.  But yes the idea would
be that fd#0/fd#1 would be already setup like they are for upload-pack
and receive-pack.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 04/15] upload-pack: convert to a builtin
  2017-12-07 16:14       ` Johannes Schindelin
@ 2017-12-08 20:26         ` Junio C Hamano
  0 siblings, 0 replies; 47+ messages in thread
From: Junio C Hamano @ 2017-12-08 20:26 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Brandon Williams, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Wed, 6 Dec 2017, Junio C Hamano wrote:
> ...
>> I vaguely recalled and feared that we on purpose kept this program
>> separate from the rest of the system for a reason, but my mailing
>> list search is coming up empty.
>
> I only recall that we kept it in the bin/ directory (as opposed to
> mlibexec/git-core/) to help with fetching via SSH.

Yes, that is about where it is installed (i.e. on $PATH), which is a
different issue.  

My vague recollection was about what is (and what is not) included
in and linked into the program built, with some reason that is
different from but similar to the reason why remote helpers that
link to curl and openssl libraries are excluded from the builtin
deliberately.  I know we exclude remote-helpers from builtin in
order to save the start-up overhead for other more important
built-in commands by not having to link these heavyweight libs.  I
suspect there was some valid reason why we didn't make upload-pack
a built-in, but am failing to recall what the reason was.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [WIP 12/15] ls-refs: introduce ls-refs server command
  2017-12-04 23:58   ` [WIP 12/15] ls-refs: introduce ls-refs server command Brandon Williams
@ 2017-12-13 16:30     ` Philip Oakley
  0 siblings, 0 replies; 47+ messages in thread
From: Philip Oakley @ 2017-12-13 16:30 UTC (permalink / raw)
  To: Brandon Williams, git; +Cc: Brandon Williams

From: "Brandon Williams" <bmwill@google.com>
Sent: Monday, December 04, 2017 11:58 PM
> Introduce the ls-refs server command.  In protocol v2, the ls-refs
> command is used to request the ref advertisement from the server.  Since
> it is a command which can be requested (as opposed to manditory in v1),
> a clinet can sent a number of parameters in its request to limit the ref

s/clinet/client/

> advertisement based on provided ref-patterns.
> 
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
Philip

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-12-13 16:30 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-20 17:18 [RFC] protocol version 2 Brandon Williams
2017-10-24  6:48 ` Junio C Hamano
2017-10-24 18:35   ` Brandon Williams
2017-10-25  1:22     ` Junio C Hamano
2017-10-26  0:59     ` Junio C Hamano
2017-10-25 13:09 ` Derrick Stolee
2017-10-25 18:10   ` Brandon Williams
2017-10-28 22:57 ` Philip Oakley
2017-10-31 18:42   ` Brandon Williams
2017-11-10 20:13 ` Jonathan Tan
2017-12-04 23:58 ` [WIP 00/15] " Brandon Williams
2017-12-04 23:58   ` [WIP 01/15] pkt-line: introduce packet_read_with_status Brandon Williams
2017-12-07 20:53     ` Stefan Beller
2017-12-08 18:03       ` Brandon Williams
2017-12-04 23:58   ` [WIP 02/15] pkt-line: introduce struct packet_reader Brandon Williams
2017-12-07 22:01     ` Stefan Beller
2017-12-08 18:11       ` Brandon Williams
2017-12-04 23:58   ` [WIP 03/15] pkt-line: add delim packet support Brandon Williams
2017-12-07 22:30     ` Stefan Beller
2017-12-08 20:08       ` Brandon Williams
2017-12-04 23:58   ` [WIP 04/15] upload-pack: convert to a builtin Brandon Williams
2017-12-06 21:59     ` Junio C Hamano
2017-12-07 16:14       ` Johannes Schindelin
2017-12-08 20:26         ` Junio C Hamano
2017-12-08 20:12       ` Brandon Williams
2017-12-04 23:58   ` [WIP 05/15] upload-pack: factor out processing lines Brandon Williams
2017-12-04 23:58   ` [WIP 06/15] transport: use get_refs_via_connect to get refs Brandon Williams
2017-12-06 22:10     ` Junio C Hamano
2017-12-07 18:40       ` Brandon Williams
2017-12-04 23:58   ` [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader Brandon Williams
2017-12-06 22:39     ` Junio C Hamano
2017-12-08 20:19       ` Brandon Williams
2017-12-04 23:58   ` [WIP 08/15] connect: discover protocol version outside of get_remote_heads Brandon Williams
2017-12-07 18:50     ` Junio C Hamano
2017-12-07 19:04       ` Brandon Williams
2017-12-07 19:30         ` Junio C Hamano
2017-12-08 20:11           ` Brandon Williams
2017-12-04 23:58   ` [WIP 09/15] transport: store protocol version Brandon Williams
2017-12-04 23:58   ` [WIP 10/15] protocol: introduce enum protocol_version value protocol_v2 Brandon Williams
2017-12-04 23:58   ` [WIP 11/15] serve: introduce git-serve Brandon Williams
2017-12-07 23:42     ` Junio C Hamano
2017-12-08 20:25       ` Brandon Williams
2017-12-04 23:58   ` [WIP 12/15] ls-refs: introduce ls-refs server command Brandon Williams
2017-12-13 16:30     ` Philip Oakley
2017-12-04 23:58   ` [WIP 13/15] connect: request remote refs using v2 Brandon Williams
2017-12-04 23:58   ` [WIP 14/15] upload_pack: introduce fetch server command Brandon Williams
2017-12-04 23:58   ` [WIP 15/15] fetch-pack: perform a fetch using v2 Brandon Williams

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).