git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC] push: add documentation on push v2
@ 2018-07-17 21:09 Brandon Williams
  2018-07-17 23:25 ` Stefan Beller
  2018-07-24 19:28 ` Brandon Williams
  0 siblings, 2 replies; 19+ messages in thread
From: Brandon Williams @ 2018-07-17 21:09 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Signed-off-by: Brandon Williams <bmwill@google.com>
---

Since introducing protocol v2 and enabling fetch I've been thinking
about what its inverse 'push' would look like.  After talking with a
number of people I have a longish list of things that could be done to
improve push and I think I've been able to distill the core features we
want in push v2.  Thankfully (due to the capability system) most of the
other features/improvements can be added later with ease.

What I've got now is a rough design for a more flexible push, more
flexible because it allows for the server to do what it wants with the
refs that are pushed and has the ability to communicate back what was
done to the client.  The main motivation for this is to work around
issues when working with Gerrit and other code-review systems where you
need to have Change-Ids in the commit messages (now the server can just
insert them for you and send back new commits) and you need to push to
magic refs to get around various limitations (now a Gerrit server should
be able to communicate that pushing to 'master' doesn't update master
but instead creates a refs/changes/<id> ref).

Before actually moving to write any code I'm hoping to get some feedback
on if we think this is an acceptable base design for push (other
features like atomic-push, signed-push, etc can be added as
capabilities), so any comments are appreciated.

 Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 49bda76d23..16c1ce60dd 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -403,6 +403,82 @@ header.
 		2 - progress messages
 		3 - fatal error message just before stream aborts
 
+ push
+~~~~~~
+
+`push` is the command used to push ref-updates and a packfile to a remote
+server in v2.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+The format of a push request is as follows:
+
+    request = *section
+    section = (ref-updates | packfile)
+	       (delim-pkt | flush-pkt)
+
+    ref-updates = PKT-LINE("ref-updates" LF)
+		  *PKT-Line(update/force-update LF)
+
+    update = txn_id SP action SP refname SP old_oid SP new_oid
+    force-update = txn_id SP "force" SP action SP refname SP new_oid
+    action = ("create" | "delete" | "update")
+    txn_id = 1*DIGIT
+
+    packfile = PKT-LINE("packfile" LF)
+	       *PKT-LINE(*%x00-ff)
+
+    ref-updates section
+	* Transaction id's allow for mapping what was requested to what the
+	  server actually did with the ref-update.
+	* Normal ref-updates require that the old value of a ref is supplied so
+	  that the server can verify that the reference that is being updated
+	  hasn't changed while the request was being processed.
+	* Forced ref-updates only include the new value of a ref as we don't
+	  care what the old value was.
+
+    packfile section
+	* A packfile MAY not be included if only delete commands are used or if
+	  an update only incorperates objects the server already has
+
+The server will receive the packfile, unpack it, then validate each ref-update,
+and it will run any update hooks to make sure that the update is acceptable.
+If all of that is fine, the server will then update the references.
+
+The format of a push response is as follows:
+
+    response = *section
+    section = (unpack-error | ref-update-status | packfile)
+	      (delim-pkt | flush-pkt)
+
+    unpack-error = PKT-LINE("ERR" SP error-msg LF)
+
+    ref-update-status = *(update-result | update-error)
+    update-result = *PKT-LINE(txn_id SP result LF)
+    result = ("created" | "deleted" | "updated") SP refname SP old_oid SP new_oid
+    update-error = PKT-LINE(txn_id SP "error" SP error-msg LF)
+
+    packfile = PKT-LINE("packfile" LF)
+	       *PKT-LINE(*%x00-ff)
+
+    ref-update-status section
+	* This section is always included unless there was an error unpacking
+	  the packfile sent in the request.
+	* The server is given the freedom to do what it wants with the
+	  ref-updates provided in the reqeust.  This means that an update sent
+	  from the server may result in the creation of a ref or rebasing the
+	  update on the server.
+	* If a server creates any new objects due to a ref-update, a packfile
+	  MUST be sent back in the response.
+
+    packfile section
+	* This section is included if the server decided to do something with
+	  the ref-updates that involved creating new objects.
+
  server-option
 ~~~~~~~~~~~~~~~
 
-- 
2.18.0.203.gfac676dfb9-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-17 21:09 [RFC] push: add documentation on push v2 Brandon Williams
@ 2018-07-17 23:25 ` Stefan Beller
  2018-07-18 13:31   ` Derrick Stolee
  2018-07-18 17:08   ` Brandon Williams
  2018-07-24 19:28 ` Brandon Williams
  1 sibling, 2 replies; 19+ messages in thread
From: Stefan Beller @ 2018-07-17 23:25 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>
> Since introducing protocol v2 and enabling fetch I've been thinking
> about what its inverse 'push' would look like.  After talking with a
> number of people I have a longish list of things that could be done to
> improve push and I think I've been able to distill the core features we
> want in push v2.

It would be nice to know which things you want to improve.

>  Thankfully (due to the capability system) most of the
> other features/improvements can be added later with ease.
>
> What I've got now is a rough design for a more flexible push, more
> flexible because it allows for the server to do what it wants with the
> refs that are pushed and has the ability to communicate back what was
> done to the client.  The main motivation for this is to work around
> issues when working with Gerrit and other code-review systems where you
> need to have Change-Ids in the commit messages (now the server can just
> insert them for you and send back new commits) and you need to push to
> magic refs to get around various limitations (now a Gerrit server should
> be able to communicate that pushing to 'master' doesn't update master
> but instead creates a refs/changes/<id> ref).

Well Gerrit is our main motivation, but this allows for other workflows as well.
For example Facebook uses hg internally and they have a
"rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
brings up quite some contention. The protocol outlined below would allow
for such a workflow as well? (This might be an easier sell to the Git
community as most are not quite familiar with Gerrit)

> Before actually moving to write any code I'm hoping to get some feedback
> on if we think this is an acceptable base design for push (other
> features like atomic-push, signed-push, etc can be added as
> capabilities), so any comments are appreciated.
>
>  Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
>
> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 49bda76d23..16c1ce60dd 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -403,6 +403,82 @@ header.
>                 2 - progress messages
>                 3 - fatal error message just before stream aborts
>
> + push
> +~~~~~~
> +
> +`push` is the command used to push ref-updates and a packfile to a remote
> +server in v2.
> +
> +Additional features not supported in the base command will be advertised
> +as the value of the command in the capability advertisement in the form
> +of a space separated list of features: "<command>=<feature 1> <feature 2>"
> +
> +The format of a push request is as follows:
> +
> +    request = *section
> +    section = (ref-updates | packfile)

This reads as if a request consists of sections, which
each can be a "ref-updates" or a packfile, no order given,
such that multiple ref-update sections mixed with packfiles
are possible.

I would assume we'd only want to allow for ref-updates
followed by the packfile.

Given the example above for "rebase-on-push" though
it is better to first send the packfile (as that is assumed to
take longer) and then send the ref updates, such that the
rebasing could be faster and has no bottleneck.

> +              (delim-pkt | flush-pkt)



> +
> +    ref-updates = PKT-LINE("ref-updates" LF)
> +                 *PKT-Line(update/force-update LF)
> +
> +    update = txn_id SP action SP refname SP old_oid SP new_oid
> +    force-update = txn_id SP "force" SP action SP refname SP new_oid

So we insert "force" after the transaction id if we want to force it.
When adding the atomic capability later we could imagine another insert here

  1 atomic create refs/heads/new-ref <0-hash> <hash>
  1 atomic delete refs/heads/old-ref <hash> <0-hash>

which would look like a "rename" that we could also add instead.
The transaction numbers are an interesting concept, how do you
envision them to be used? In the example I put them both in the same
transaction to demonstrate the "atomic-ness", but one could also
imagine different transactions numbers per ref (i.e. exactly one
ref per txn_id) to have a better understanding of what the server did
to each individual ref.

> +    action = ("create" | "delete" | "update")
> +    txn_id = 1*DIGIT
> +
> +    packfile = PKT-LINE("packfile" LF)
> +              *PKT-LINE(*%x00-ff)
> +
> +    ref-updates section
> +       * Transaction id's allow for mapping what was requested to what the
> +         server actually did with the ref-update.

this would imply the client ought to have at most one ref per transaction id.
Is the client allowed to put multiple refs per id?

Are new capabilities attached to ref updates or transactions?
Unlike the example above, stating "atomic" on each line, you could just
say "transaction 1 should be atomic" in another line, that would address
all refs in that transaction.

> +       * Normal ref-updates require that the old value of a ref is supplied so
> +         that the server can verify that the reference that is being updated
> +         hasn't changed while the request was being processed.

create/delete assume <00..00> for either old or new ? (We could also
omit the second hash for create delete, which is more appealing to me)

> +       * Forced ref-updates only include the new value of a ref as we don't
> +         care what the old value was.

How are you implementing force-with-lease then?

> +    packfile section
> +       * A packfile MAY not be included if only delete commands are used or if
> +         an update only incorperates objects the server already has

Or rather: "An empty pack SHALL be omitted" ?

> +The server will receive the packfile, unpack it, then validate each ref-update,
> +and it will run any update hooks to make sure that the update is acceptable.
> +If all of that is fine, the server will then update the references.
> +
> +The format of a push response is as follows:
> +
> +    response = *section
> +    section = (unpack-error | ref-update-status | packfile)

As above, I assume they ought to go in the order as written,
or would it make sense to allow for any order?

> +             (delim-pkt | flush-pkt)
> +
> +    unpack-error = PKT-LINE("ERR" SP error-msg LF)
> +
> +    ref-update-status = *(update-result | update-error)
> +    update-result = *PKT-LINE(txn_id SP result LF)
> +    result = ("created" | "deleted" | "updated") SP refname SP old_oid SP new_oid
> +    update-error = PKT-LINE(txn_id SP "error" SP error-msg LF)

Can we unify "ERR" and "error" ?

> +    packfile = PKT-LINE("packfile" LF)
> +              *PKT-LINE(*%x00-ff)
> +
> +    ref-update-status section
> +       * This section is always included unless there was an error unpacking
> +         the packfile sent in the request.
> +       * The server is given the freedom to do what it wants with the
> +         ref-updates provided in the reqeust.  This means that an update sent
> +         from the server may result in the creation of a ref or rebasing the
> +         update on the server.
> +       * If a server creates any new objects due to a ref-update, a packfile
> +         MUST be sent back in the response.
> +
> +    packfile section
> +       * This section is included if the server decided to do something with
> +         the ref-updates that involved creating new objects.
> +
>   server-option
>  ~~~~~~~~~~~~~~~
>
> --
> 2.18.0.203.gfac676dfb9-goog
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-17 23:25 ` Stefan Beller
@ 2018-07-18 13:31   ` Derrick Stolee
  2018-07-18 16:56     ` Stefan Beller
  2018-07-18 17:11     ` Brandon Williams
  2018-07-18 17:08   ` Brandon Williams
  1 sibling, 2 replies; 19+ messages in thread
From: Derrick Stolee @ 2018-07-18 13:31 UTC (permalink / raw)
  To: Stefan Beller, Brandon Williams; +Cc: git

On 7/17/2018 7:25 PM, Stefan Beller wrote:
> On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
>> Signed-off-by: Brandon Williams <bmwill@google.com>
>> ---
>>
>> Since introducing protocol v2 and enabling fetch I've been thinking
>> about what its inverse 'push' would look like.  After talking with a
>> number of people I have a longish list of things that could be done to
>> improve push and I think I've been able to distill the core features we
>> want in push v2.
> It would be nice to know which things you want to improve.

Hopefully we can also get others to chime in with things they don't like 
about the existing protocol. What pain points exist, and what can we do 
to improve at the transport layer before considering new functionality?

>>   Thankfully (due to the capability system) most of the
>> other features/improvements can be added later with ease.
>>
>> What I've got now is a rough design for a more flexible push, more
>> flexible because it allows for the server to do what it wants with the
>> refs that are pushed and has the ability to communicate back what was
>> done to the client.  The main motivation for this is to work around
>> issues when working with Gerrit and other code-review systems where you
>> need to have Change-Ids in the commit messages (now the server can just
>> insert them for you and send back new commits) and you need to push to
>> magic refs to get around various limitations (now a Gerrit server should
>> be able to communicate that pushing to 'master' doesn't update master
>> but instead creates a refs/changes/<id> ref).
> Well Gerrit is our main motivation, but this allows for other workflows as well.
> For example Facebook uses hg internally and they have a
> "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> brings up quite some contention. The protocol outlined below would allow
> for such a workflow as well? (This might be an easier sell to the Git
> community as most are not quite familiar with Gerrit)

I'm also curious how this "change commits on push" would be helpful to 
other scenarios.

Since I'm not familiar with Gerrit: what is preventing you from having a 
commit hook that inserts (or requests) a Change-Id when not present? How 
can the server identify the Change-Id automatically when it isn't present?

>> Before actually moving to write any code I'm hoping to get some feedback
>> on if we think this is an acceptable base design for push (other
>> features like atomic-push, signed-push, etc can be added as
>> capabilities), so any comments are appreciated.
>>
>>   Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
>>   1 file changed, 76 insertions(+)
>>
>> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
>> index 49bda76d23..16c1ce60dd 100644
>> --- a/Documentation/technical/protocol-v2.txt
>> +++ b/Documentation/technical/protocol-v2.txt
>> @@ -403,6 +403,82 @@ header.
>>                  2 - progress messages
>>                  3 - fatal error message just before stream aborts
>>
>> + push
>> +~~~~~~
>> +
>> +`push` is the command used to push ref-updates and a packfile to a remote
>> +server in v2.
>> +
>> +Additional features not supported in the base command will be advertised
>> +as the value of the command in the capability advertisement in the form
>> +of a space separated list of features: "<command>=<feature 1> <feature 2>"
>> +
>> +The format of a push request is as follows:
>> +
>> +    request = *section
>> +    section = (ref-updates | packfile)
> This reads as if a request consists of sections, which
> each can be a "ref-updates" or a packfile, no order given,
> such that multiple ref-update sections mixed with packfiles
> are possible.
>
> I would assume we'd only want to allow for ref-updates
> followed by the packfile.
>
> Given the example above for "rebase-on-push" though
> it is better to first send the packfile (as that is assumed to
> take longer) and then send the ref updates, such that the
> rebasing could be faster and has no bottleneck.
>
>> +              (delim-pkt | flush-pkt)
>
>
>> +
>> +    ref-updates = PKT-LINE("ref-updates" LF)
>> +                 *PKT-Line(update/force-update LF)
>> +
>> +    update = txn_id SP action SP refname SP old_oid SP new_oid
>> +    force-update = txn_id SP "force" SP action SP refname SP new_oid
> So we insert "force" after the transaction id if we want to force it.
> When adding the atomic capability later we could imagine another insert here
>
>    1 atomic create refs/heads/new-ref <0-hash> <hash>
>    1 atomic delete refs/heads/old-ref <hash> <0-hash>
>
> which would look like a "rename" that we could also add instead.
> The transaction numbers are an interesting concept, how do you
> envision them to be used? In the example I put them both in the same
> transaction to demonstrate the "atomic-ness", but one could also
> imagine different transactions numbers per ref (i.e. exactly one
> ref per txn_id) to have a better understanding of what the server did
> to each individual ref.
>
>> +    action = ("create" | "delete" | "update")
>> +    txn_id = 1*DIGIT
>> +
>> +    packfile = PKT-LINE("packfile" LF)
>> +              *PKT-LINE(*%x00-ff)
>> +
>> +    ref-updates section
>> +       * Transaction id's allow for mapping what was requested to what the
>> +         server actually did with the ref-update.
> this would imply the client ought to have at most one ref per transaction id.
> Is the client allowed to put multiple refs per id?
>
> Are new capabilities attached to ref updates or transactions?
> Unlike the example above, stating "atomic" on each line, you could just
> say "transaction 1 should be atomic" in another line, that would address
> all refs in that transaction.
>
>> +       * Normal ref-updates require that the old value of a ref is supplied so
>> +         that the server can verify that the reference that is being updated
>> +         hasn't changed while the request was being processed.
> create/delete assume <00..00> for either old or new ? (We could also
> omit the second hash for create delete, which is more appealing to me)
>
>> +       * Forced ref-updates only include the new value of a ref as we don't
>> +         care what the old value was.
> How are you implementing force-with-lease then?

I had the same question.

>
>> +    packfile section
>> +       * A packfile MAY not be included if only delete commands are used or if
>> +         an update only incorperates objects the server already has
> Or rather: "An empty pack SHALL be omitted" ?
>
>> +The server will receive the packfile, unpack it, then validate each ref-update,
>> +and it will run any update hooks to make sure that the update is acceptable.
>> +If all of that is fine, the server will then update the references.
>> +
>> +The format of a push response is as follows:
>> +
>> +    response = *section
>> +    section = (unpack-error | ref-update-status | packfile)
> As above, I assume they ought to go in the order as written,
> or would it make sense to allow for any order?
>
>> +             (delim-pkt | flush-pkt)
>> +
>> +    unpack-error = PKT-LINE("ERR" SP error-msg LF)
>> +
>> +    ref-update-status = *(update-result | update-error)
>> +    update-result = *PKT-LINE(txn_id SP result LF)
>> +    result = ("created" | "deleted" | "updated") SP refname SP old_oid SP new_oid
>> +    update-error = PKT-LINE(txn_id SP "error" SP error-msg LF)
> Can we unify "ERR" and "error" ?
>
>> +    packfile = PKT-LINE("packfile" LF)
>> +              *PKT-LINE(*%x00-ff)
>> +
>> +    ref-update-status section
>> +       * This section is always included unless there was an error unpacking
>> +         the packfile sent in the request.
>> +       * The server is given the freedom to do what it wants with the
>> +         ref-updates provided in the reqeust.  This means that an update sent
>> +         from the server may result in the creation of a ref or rebasing the
>> +         update on the server.
>> +       * If a server creates any new objects due to a ref-update, a packfile
>> +         MUST be sent back in the response.
>> +
>> +    packfile section
>> +       * This section is included if the server decided to do something with
>> +         the ref-updates that involved creating new objects.
>> +
>>    server-option
>>   ~~~~~~~~~~~~~~~
>>
>> --
>> 2.18.0.203.gfac676dfb9-goog
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 13:31   ` Derrick Stolee
@ 2018-07-18 16:56     ` Stefan Beller
  2018-07-18 17:15       ` Brandon Williams
  2018-07-18 17:11     ` Brandon Williams
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-07-18 16:56 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Brandon Williams, git

On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/17/2018 7:25 PM, Stefan Beller wrote:
> > On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
> >> Signed-off-by: Brandon Williams <bmwill@google.com>
> >> ---
> >>
> >> Since introducing protocol v2 and enabling fetch I've been thinking
> >> about what its inverse 'push' would look like.  After talking with a
> >> number of people I have a longish list of things that could be done to
> >> improve push and I think I've been able to distill the core features we
> >> want in push v2.
> > It would be nice to know which things you want to improve.
>
> Hopefully we can also get others to chime in with things they don't like
> about the existing protocol. What pain points exist, and what can we do
> to improve at the transport layer before considering new functionality?

Another thing that I realized last night was the possibility to chunk requests.
The web of today is driven by lots of small http(s) requests. I know our server
team fights with the internal tools all the time because the communication
involved in git-fetch is usually a large http request (large packfile).
So it would be nice to have the possibility of chunking the request.
But I think that can be added as a capability? (Not sure how)

> >>   Thankfully (due to the capability system) most of the
> >> other features/improvements can be added later with ease.
> >>
> >> What I've got now is a rough design for a more flexible push, more
> >> flexible because it allows for the server to do what it wants with the
> >> refs that are pushed and has the ability to communicate back what was
> >> done to the client.  The main motivation for this is to work around
> >> issues when working with Gerrit and other code-review systems where you
> >> need to have Change-Ids in the commit messages (now the server can just
> >> insert them for you and send back new commits) and you need to push to
> >> magic refs to get around various limitations (now a Gerrit server should
> >> be able to communicate that pushing to 'master' doesn't update master
> >> but instead creates a refs/changes/<id> ref).
> > Well Gerrit is our main motivation, but this allows for other workflows as well.
> > For example Facebook uses hg internally and they have a
> > "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> > brings up quite some contention. The protocol outlined below would allow
> > for such a workflow as well? (This might be an easier sell to the Git
> > community as most are not quite familiar with Gerrit)
>
> I'm also curious how this "change commits on push" would be helpful to
> other scenarios.
>
> Since I'm not familiar with Gerrit: what is preventing you from having a
> commit hook that inserts (or requests) a Change-Id when not present?

That is how you do it normally. But what if you just get started or want to
send a one-off to the server (I wanted to upload a git patch to our internal
Gerrit once, and as my repository is configured to work with upstream Git
which doesn't carry change ids, I ran into this problem. I had to manually
add it to have the server accept it)

> How
> can the server identify the Change-Id automatically when it isn't present?

The change id is just a randomly assigned id, which can be made up,
but should stay consistent in further revisions. (Put another way:
change ids solve the 'linear assignment problem' of range-diff at scale)

So once the protocol support is in, the client would need to get some UX
update to replace its commits just pushed with the answer from the server
to work well with server side generated change ids.

But as said I am not sure how much we want to discuss in that direction,
but rather see if we could have other use cases:
Instead of just rebasing to solve the contention problem server side,
we could also offer a "coding helper as a service" - server. That would
work similar as the change id workflow lines out above:
You push to the server, the server performs some action (style formatting
your code for example, linting) and you download it back and have it locallly
again.

I think that would be pretty cool actually.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-17 23:25 ` Stefan Beller
  2018-07-18 13:31   ` Derrick Stolee
@ 2018-07-18 17:08   ` Brandon Williams
  2018-07-18 18:07     ` Stefan Beller
  1 sibling, 1 reply; 19+ messages in thread
From: Brandon Williams @ 2018-07-18 17:08 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 07/17, Stefan Beller wrote:
> On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
> >
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> > ---
> >
> > Since introducing protocol v2 and enabling fetch I've been thinking
> > about what its inverse 'push' would look like.  After talking with a
> > number of people I have a longish list of things that could be done to
> > improve push and I think I've been able to distill the core features we
> > want in push v2.
> 
> It would be nice to know which things you want to improve.

I mean this tackles the main point I want to improve.  Others include:
rebase/merge-on-push, push symrefs, and then the inclusion of other
capabilities from v0 like atomic push, signed push, etc.

> 
> >  Thankfully (due to the capability system) most of the
> > other features/improvements can be added later with ease.
> >
> > What I've got now is a rough design for a more flexible push, more
> > flexible because it allows for the server to do what it wants with the
> > refs that are pushed and has the ability to communicate back what was
> > done to the client.  The main motivation for this is to work around
> > issues when working with Gerrit and other code-review systems where you
> > need to have Change-Ids in the commit messages (now the server can just
> > insert them for you and send back new commits) and you need to push to
> > magic refs to get around various limitations (now a Gerrit server should
> > be able to communicate that pushing to 'master' doesn't update master
> > but instead creates a refs/changes/<id> ref).
> 
> Well Gerrit is our main motivation, but this allows for other workflows as well.
> For example Facebook uses hg internally and they have a
> "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> brings up quite some contention. The protocol outlined below would allow
> for such a workflow as well? (This might be an easier sell to the Git
> community as most are not quite familiar with Gerrit)

Yes the idea would be such that we could easily add a "rebase" or
"merge" verb later for explicit user controlled workflows like that.
This proposal would already make it possible (although the server would
need to be configured as such) since the server can do what it wants
with the updates a client sends to it.

> 
> > Before actually moving to write any code I'm hoping to get some feedback
> > on if we think this is an acceptable base design for push (other
> > features like atomic-push, signed-push, etc can be added as
> > capabilities), so any comments are appreciated.
> >
> >  Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
> >  1 file changed, 76 insertions(+)
> >
> > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> > index 49bda76d23..16c1ce60dd 100644
> > --- a/Documentation/technical/protocol-v2.txt
> > +++ b/Documentation/technical/protocol-v2.txt
> > @@ -403,6 +403,82 @@ header.
> >                 2 - progress messages
> >                 3 - fatal error message just before stream aborts
> >
> > + push
> > +~~~~~~
> > +
> > +`push` is the command used to push ref-updates and a packfile to a remote
> > +server in v2.
> > +
> > +Additional features not supported in the base command will be advertised
> > +as the value of the command in the capability advertisement in the form
> > +of a space separated list of features: "<command>=<feature 1> <feature 2>"
> > +
> > +The format of a push request is as follows:
> > +
> > +    request = *section
> > +    section = (ref-updates | packfile)
> 
> This reads as if a request consists of sections, which
> each can be a "ref-updates" or a packfile, no order given,
> such that multiple ref-update sections mixed with packfiles
> are possible.
> 
> I would assume we'd only want to allow for ref-updates
> followed by the packfile.
> 
> Given the example above for "rebase-on-push" though
> it is better to first send the packfile (as that is assumed to
> take longer) and then send the ref updates, such that the
> rebasing could be faster and has no bottleneck.

I don't really follow this logic.  I don't think it would change
anything much considering the ref-updates section would usually be
much smaller than the packfile itself, course I don't have any data so
idk.

> 
> > +              (delim-pkt | flush-pkt)
> 
> 
> 
> > +
> > +    ref-updates = PKT-LINE("ref-updates" LF)
> > +                 *PKT-Line(update/force-update LF)
> > +
> > +    update = txn_id SP action SP refname SP old_oid SP new_oid
> > +    force-update = txn_id SP "force" SP action SP refname SP new_oid
> 
> So we insert "force" after the transaction id if we want to force it.
> When adding the atomic capability later we could imagine another insert here
> 
>   1 atomic create refs/heads/new-ref <0-hash> <hash>
>   1 atomic delete refs/heads/old-ref <hash> <0-hash>
> 
> which would look like a "rename" that we could also add instead.
> The transaction numbers are an interesting concept, how do you
> envision them to be used? In the example I put them both in the same
> transaction to demonstrate the "atomic-ness", but one could also
> imagine different transactions numbers per ref (i.e. exactly one
> ref per txn_id) to have a better understanding of what the server did
> to each individual ref.

I believe I outlined their use later.  Basically if you give the server
free reign to do what it wants with the updates you send it, then you
need a way for the client to be able to map the result back to what it
requested.  Since now i could push to "master" but instead of updating
master the server creates a refs/changes/1 ref and puts my changes there
instead of updating master.  The client needs to know that the ref
update it requested to master is what caused the creation of the
refs/changes/1 ref.

> 
> > +    action = ("create" | "delete" | "update")
> > +    txn_id = 1*DIGIT
> > +
> > +    packfile = PKT-LINE("packfile" LF)
> > +              *PKT-LINE(*%x00-ff)
> > +
> > +    ref-updates section
> > +       * Transaction id's allow for mapping what was requested to what the
> > +         server actually did with the ref-update.
> 
> this would imply the client ought to have at most one ref per transaction id.
> Is the client allowed to put multiple refs per id?

No code has been written yet, so idk.  Right now i don't see any reason
to have multiple updates using the same id but that could change.

> 
> Are new capabilities attached to ref updates or transactions?
> Unlike the example above, stating "atomic" on each line, you could just
> say "transaction 1 should be atomic" in another line, that would address
> all refs in that transaction.

I haven't thought through "atomic" so i have no idea what you'd want
that to look like.

> 
> > +       * Normal ref-updates require that the old value of a ref is supplied so
> > +         that the server can verify that the reference that is being updated
> > +         hasn't changed while the request was being processed.
> 
> create/delete assume <00..00> for either old or new ? (We could also
> omit the second hash for create delete, which is more appealing to me)

Well that depends, in the case of a create you want to ensure that no
ref with that name exists and would want it to fail if one already
existed.  If you want to force it then you don't care if one existed or
not, you just want the ref to have a certain value.

> 
> > +       * Forced ref-updates only include the new value of a ref as we don't
> > +         care what the old value was.
> 
> How are you implementing force-with-lease then?

Currently force-with-lease/force is implemented 100% on the client side,
this proposal extends these two to be implemented on the server as well.
non-forced variant are basically the "with-lease" case and "force" now
actually forces an update.  Right now you can still have a "forced"
update fail to update when using a stateless transport because by the
time you send the ref-updates they've changed from what you read in the
ref-advertisement (this can happen with projects with high velocity).

If you just wanted to preserve the existing force-with-lease/force
behavior you can simply use the non-force variant of a ref-update.

> 
> > +    packfile section
> > +       * A packfile MAY not be included if only delete commands are used or if
> > +         an update only incorperates objects the server already has
> 
> Or rather: "An empty pack SHALL be omitted" ?
> 
> > +The server will receive the packfile, unpack it, then validate each ref-update,
> > +and it will run any update hooks to make sure that the update is acceptable.
> > +If all of that is fine, the server will then update the references.
> > +
> > +The format of a push response is as follows:
> > +
> > +    response = *section
> > +    section = (unpack-error | ref-update-status | packfile)
> 
> As above, I assume they ought to go in the order as written,
> or would it make sense to allow for any order?
> 
> > +             (delim-pkt | flush-pkt)
> > +
> > +    unpack-error = PKT-LINE("ERR" SP error-msg LF)
> > +
> > +    ref-update-status = *(update-result | update-error)
> > +    update-result = *PKT-LINE(txn_id SP result LF)
> > +    result = ("created" | "deleted" | "updated") SP refname SP old_oid SP new_oid
> > +    update-error = PKT-LINE(txn_id SP "error" SP error-msg LF)
> 
> Can we unify "ERR" and "error" ?

No, these are very different.  You could have one ref update succeed
while another doesn't for some reason, unless you want everything to be
atomic.

> 
> > +    packfile = PKT-LINE("packfile" LF)
> > +              *PKT-LINE(*%x00-ff)
> > +
> > +    ref-update-status section
> > +       * This section is always included unless there was an error unpacking
> > +         the packfile sent in the request.
> > +       * The server is given the freedom to do what it wants with the
> > +         ref-updates provided in the reqeust.  This means that an update sent
> > +         from the server may result in the creation of a ref or rebasing the
> > +         update on the server.
> > +       * If a server creates any new objects due to a ref-update, a packfile
> > +         MUST be sent back in the response.
> > +
> > +    packfile section
> > +       * This section is included if the server decided to do something with
> > +         the ref-updates that involved creating new objects.
> > +
> >   server-option
> >  ~~~~~~~~~~~~~~~
> >
> > --
> > 2.18.0.203.gfac676dfb9-goog
> >

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 13:31   ` Derrick Stolee
  2018-07-18 16:56     ` Stefan Beller
@ 2018-07-18 17:11     ` Brandon Williams
  2018-07-18 17:19       ` Duy Nguyen
  1 sibling, 1 reply; 19+ messages in thread
From: Brandon Williams @ 2018-07-18 17:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Stefan Beller, git

On 07/18, Derrick Stolee wrote:
> On 7/17/2018 7:25 PM, Stefan Beller wrote:
> > On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
> > > Signed-off-by: Brandon Williams <bmwill@google.com>
> > > ---
> > > 
> > > Since introducing protocol v2 and enabling fetch I've been thinking
> > > about what its inverse 'push' would look like.  After talking with a
> > > number of people I have a longish list of things that could be done to
> > > improve push and I think I've been able to distill the core features we
> > > want in push v2.
> > It would be nice to know which things you want to improve.
> 
> Hopefully we can also get others to chime in with things they don't like
> about the existing protocol. What pain points exist, and what can we do to
> improve at the transport layer before considering new functionality?
> 
> > >   Thankfully (due to the capability system) most of the
> > > other features/improvements can be added later with ease.
> > > 
> > > What I've got now is a rough design for a more flexible push, more
> > > flexible because it allows for the server to do what it wants with the
> > > refs that are pushed and has the ability to communicate back what was
> > > done to the client.  The main motivation for this is to work around
> > > issues when working with Gerrit and other code-review systems where you
> > > need to have Change-Ids in the commit messages (now the server can just
> > > insert them for you and send back new commits) and you need to push to
> > > magic refs to get around various limitations (now a Gerrit server should
> > > be able to communicate that pushing to 'master' doesn't update master
> > > but instead creates a refs/changes/<id> ref).
> > Well Gerrit is our main motivation, but this allows for other workflows as well.
> > For example Facebook uses hg internally and they have a
> > "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> > brings up quite some contention. The protocol outlined below would allow
> > for such a workflow as well? (This might be an easier sell to the Git
> > community as most are not quite familiar with Gerrit)
> 
> I'm also curious how this "change commits on push" would be helpful to other
> scenarios.
> 
> Since I'm not familiar with Gerrit: what is preventing you from having a
> commit hook that inserts (or requests) a Change-Id when not present? How can
> the server identify the Change-Id automatically when it isn't present?

Right now all Gerrit users have a commit hook installed which inserts
the Change-Id.  The issue is that if you push to gerrit and you don't
have Change-ids, the push fails and you're prompted to blindly run a
command to install the commit-hook.  So if we could just have the server
handle this completely then the users of gerrit wouldn't ever need to
have a hook installed in the first place.


-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 16:56     ` Stefan Beller
@ 2018-07-18 17:15       ` Brandon Williams
  2018-07-20 13:12         ` Jeff Hostetler
  0 siblings, 1 reply; 19+ messages in thread
From: Brandon Williams @ 2018-07-18 17:15 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Derrick Stolee, git

On 07/18, Stefan Beller wrote:
> On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 7/17/2018 7:25 PM, Stefan Beller wrote:
> > > On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
> > >> Signed-off-by: Brandon Williams <bmwill@google.com>
> > >> ---
> > >>
> > >> Since introducing protocol v2 and enabling fetch I've been thinking
> > >> about what its inverse 'push' would look like.  After talking with a
> > >> number of people I have a longish list of things that could be done to
> > >> improve push and I think I've been able to distill the core features we
> > >> want in push v2.
> > > It would be nice to know which things you want to improve.
> >
> > Hopefully we can also get others to chime in with things they don't like
> > about the existing protocol. What pain points exist, and what can we do
> > to improve at the transport layer before considering new functionality?
> 
> Another thing that I realized last night was the possibility to chunk requests.
> The web of today is driven by lots of small http(s) requests. I know our server
> team fights with the internal tools all the time because the communication
> involved in git-fetch is usually a large http request (large packfile).
> So it would be nice to have the possibility of chunking the request.
> But I think that can be added as a capability? (Not sure how)

Fetch and push requests/responses are already "chunked" when using the
http transport.  So I'm not sure what you mean by adding a capability
because the protocol doesn't care about which transport you're using.
This is of course unless you're talking about a different "chunking"
from what it means to chunk an http request/response.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 17:11     ` Brandon Williams
@ 2018-07-18 17:19       ` Duy Nguyen
  2018-07-18 17:46         ` Brandon Williams
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2018-07-18 17:19 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Derrick Stolee, Stefan Beller, Git Mailing List

On Wed, Jul 18, 2018 at 7:13 PM Brandon Williams <bmwill@google.com> wrote:
> > > > What I've got now is a rough design for a more flexible push, more
> > > > flexible because it allows for the server to do what it wants with the
> > > > refs that are pushed and has the ability to communicate back what was
> > > > done to the client.  The main motivation for this is to work around
> > > > issues when working with Gerrit and other code-review systems where you
> > > > need to have Change-Ids in the commit messages (now the server can just
> > > > insert them for you and send back new commits) and you need to push to
> > > > magic refs to get around various limitations (now a Gerrit server should
> > > > be able to communicate that pushing to 'master' doesn't update master
> > > > but instead creates a refs/changes/<id> ref).
> > > Well Gerrit is our main motivation, but this allows for other workflows as well.
> > > For example Facebook uses hg internally and they have a
> > > "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> > > brings up quite some contention. The protocol outlined below would allow
> > > for such a workflow as well? (This might be an easier sell to the Git
> > > community as most are not quite familiar with Gerrit)
> >
> > I'm also curious how this "change commits on push" would be helpful to other
> > scenarios.
> >
> > Since I'm not familiar with Gerrit: what is preventing you from having a
> > commit hook that inserts (or requests) a Change-Id when not present? How can
> > the server identify the Change-Id automatically when it isn't present?
>
> Right now all Gerrit users have a commit hook installed which inserts
> the Change-Id.  The issue is that if you push to gerrit and you don't
> have Change-ids, the push fails and you're prompted to blindly run a
> command to install the commit-hook.  So if we could just have the server
> handle this completely then the users of gerrit wouldn't ever need to
> have a hook installed in the first place.

I don't trust the server side to rewrite commits for me. And this is
basically rewriting history (e.g. I can push multiple commits to
gerrit if I remember correctly; if they all don't have change-id, then
the history must be rewritten for change-id to be inserted). Don't we
already have "plans" to push config from server to client? There's
also talk about configuring hooks with config file. These should make
it possible to deal with change-id generation with minimum manual
intervention.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 17:19       ` Duy Nguyen
@ 2018-07-18 17:46         ` Brandon Williams
  2018-07-18 17:57           ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Brandon Williams @ 2018-07-18 17:46 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Derrick Stolee, Stefan Beller, Git Mailing List

On 07/18, Duy Nguyen wrote:
> On Wed, Jul 18, 2018 at 7:13 PM Brandon Williams <bmwill@google.com> wrote:
> > > > > What I've got now is a rough design for a more flexible push, more
> > > > > flexible because it allows for the server to do what it wants with the
> > > > > refs that are pushed and has the ability to communicate back what was
> > > > > done to the client.  The main motivation for this is to work around
> > > > > issues when working with Gerrit and other code-review systems where you
> > > > > need to have Change-Ids in the commit messages (now the server can just
> > > > > insert them for you and send back new commits) and you need to push to
> > > > > magic refs to get around various limitations (now a Gerrit server should
> > > > > be able to communicate that pushing to 'master' doesn't update master
> > > > > but instead creates a refs/changes/<id> ref).
> > > > Well Gerrit is our main motivation, but this allows for other workflows as well.
> > > > For example Facebook uses hg internally and they have a
> > > > "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> > > > brings up quite some contention. The protocol outlined below would allow
> > > > for such a workflow as well? (This might be an easier sell to the Git
> > > > community as most are not quite familiar with Gerrit)
> > >
> > > I'm also curious how this "change commits on push" would be helpful to other
> > > scenarios.
> > >
> > > Since I'm not familiar with Gerrit: what is preventing you from having a
> > > commit hook that inserts (or requests) a Change-Id when not present? How can
> > > the server identify the Change-Id automatically when it isn't present?
> >
> > Right now all Gerrit users have a commit hook installed which inserts
> > the Change-Id.  The issue is that if you push to gerrit and you don't
> > have Change-ids, the push fails and you're prompted to blindly run a
> > command to install the commit-hook.  So if we could just have the server
> > handle this completely then the users of gerrit wouldn't ever need to
> > have a hook installed in the first place.
> 
> I don't trust the server side to rewrite commits for me. And this is

That's a fair point.  Though I think there are a number of projects
where this would be very beneficial for contributors. The main reason
for wanting a feature like this is to make the UX easier for Gerrit
users (Having server insert change ids as well as potentially getting
rid of the weird HEAD:refs/for/master syntax you need to push for
review).  Also, if you don't control a server yourself, then who ever
controls it can do what it wants with the objects/ref-updates you send
them.  Of course even if they rewrite history that doesn't mean your
local copy needs to mimic those changes if you don't want them too.  So
even if we move forward with a design like this, there would need to be
some config option to actually accept and apply the changes a server
makes and sends back to you.  This RFC doesn't actually address those
sorts of UX implications because I expect those are things which can be
added and tweaked at some point in the future.  I'm just trying to build
the foundation for such changes.

> basically rewriting history (e.g. I can push multiple commits to
> gerrit if I remember correctly; if they all don't have change-id, then
> the history must be rewritten for change-id to be inserted). Don't we
> already have "plans" to push config from server to client? There's

I know there has been talk about this, but I don't know any of any
current proposals or work being done in this area.

> also talk about configuring hooks with config file. These should make
> it possible to deal with change-id generation with minimum manual
> intervention.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 17:46         ` Brandon Williams
@ 2018-07-18 17:57           ` Duy Nguyen
  0 siblings, 0 replies; 19+ messages in thread
From: Duy Nguyen @ 2018-07-18 17:57 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Derrick Stolee, Stefan Beller, Git Mailing List

On Wed, Jul 18, 2018 at 7:46 PM Brandon Williams <bmwill@google.com> wrote:
>
> On 07/18, Duy Nguyen wrote:
> > On Wed, Jul 18, 2018 at 7:13 PM Brandon Williams <bmwill@google.com> wrote:
> > > > > > What I've got now is a rough design for a more flexible push, more
> > > > > > flexible because it allows for the server to do what it wants with the
> > > > > > refs that are pushed and has the ability to communicate back what was
> > > > > > done to the client.  The main motivation for this is to work around
> > > > > > issues when working with Gerrit and other code-review systems where you
> > > > > > need to have Change-Ids in the commit messages (now the server can just
> > > > > > insert them for you and send back new commits) and you need to push to
> > > > > > magic refs to get around various limitations (now a Gerrit server should
> > > > > > be able to communicate that pushing to 'master' doesn't update master
> > > > > > but instead creates a refs/changes/<id> ref).
> > > > > Well Gerrit is our main motivation, but this allows for other workflows as well.
> > > > > For example Facebook uses hg internally and they have a
> > > > > "rebase-on-the-server-after-push" workflow IIRC as pushing to a single repo
> > > > > brings up quite some contention. The protocol outlined below would allow
> > > > > for such a workflow as well? (This might be an easier sell to the Git
> > > > > community as most are not quite familiar with Gerrit)
> > > >
> > > > I'm also curious how this "change commits on push" would be helpful to other
> > > > scenarios.
> > > >
> > > > Since I'm not familiar with Gerrit: what is preventing you from having a
> > > > commit hook that inserts (or requests) a Change-Id when not present? How can
> > > > the server identify the Change-Id automatically when it isn't present?
> > >
> > > Right now all Gerrit users have a commit hook installed which inserts
> > > the Change-Id.  The issue is that if you push to gerrit and you don't
> > > have Change-ids, the push fails and you're prompted to blindly run a
> > > command to install the commit-hook.  So if we could just have the server
> > > handle this completely then the users of gerrit wouldn't ever need to
> > > have a hook installed in the first place.
> >
> > I don't trust the server side to rewrite commits for me. And this is
>
> That's a fair point.  Though I think there are a number of projects
> where this would be very beneficial for contributors. The main reason
> for wanting a feature like this is to make the UX easier for Gerrit
> users (Having server insert change ids as well as potentially getting
> rid of the weird HEAD:refs/for/master syntax you need to push for
> review).  Also, if you don't control a server yourself, then who ever
> controls it can do what it wants with the objects/ref-updates you send
> them.  Of course even if they rewrite history that doesn't mean your
> local copy needs to mimic those changes if you don't want them too.  So
> even if we move forward with a design like this, there would need to be
> some config option to actually accept and apply the changes a server
> makes and sends back to you.

This is the main pain point for me. I almost wrote a follow mail along
the line of "having said that, if we can be transparent what the
changes are or have some protection  at client side against
"dangerous" changes like tree/blob/commit-header replacement then it's
probably ok".

There's also other things like signing which will not work well with
remote updates like this. I guess you can cross it out as "not
supported" after consideration though.

> This RFC doesn't actually address those
> sorts of UX implications because I expect those are things which can be
> added and tweaked at some point in the future.  I'm just trying to build
> the foundation for such changes.

Speaking of UX, gerrit and this server-side ref-update. My experience
with average gerrit users is they tend to stick to a very basic set of
commands and if this is not handled well, you just replace the
one-time pain of installing hooks with a new one that happens much
more often, potentially more confusing too.

> > basically rewriting history (e.g. I can push multiple commits to
> > gerrit if I remember correctly; if they all don't have change-id, then
> > the history must be rewritten for change-id to be inserted). Don't we
> > already have "plans" to push config from server to client? There's
>
> I know there has been talk about this, but I don't know any of any
> current proposals or work being done in this area.

As far as I know, nobody has worked on it. You're welcome to start of course ;-)
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 17:08   ` Brandon Williams
@ 2018-07-18 18:07     ` Stefan Beller
  2018-07-18 18:17       ` Duy Nguyen
  2018-07-18 18:21       ` Brandon Williams
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Beller @ 2018-07-18 18:07 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

> > Given the example above for "rebase-on-push" though
> > it is better to first send the packfile (as that is assumed to
> > take longer) and then send the ref updates, such that the
> > rebasing could be faster and has no bottleneck.
>
> I don't really follow this logic.  I don't think it would change
> anything much considering the ref-updates section would usually be
> much smaller than the packfile itself, course I don't have any data so
> idk.

The server would need to serialize all incoming requests and apply
them in order. The receiving of the packfile and the response to the client
are not part of the critical section that needs to happen serialized but
can be spread out to threads. So for that use case it would make
sense to allow sending the packfile first.

> > > +    update = txn_id SP action SP refname SP old_oid SP new_oid
> > > +    force-update = txn_id SP "force" SP action SP refname SP new_oid
> >
> > So we insert "force" after the transaction id if we want to force it.
> > When adding the atomic capability later we could imagine another insert here
> >
> >   1 atomic create refs/heads/new-ref <0-hash> <hash>
> >   1 atomic delete refs/heads/old-ref <hash> <0-hash>
> >
> > which would look like a "rename" that we could also add instead.
> > The transaction numbers are an interesting concept, how do you
> > envision them to be used? In the example I put them both in the same
> > transaction to demonstrate the "atomic-ness", but one could also
> > imagine different transactions numbers per ref (i.e. exactly one
> > ref per txn_id) to have a better understanding of what the server did
> > to each individual ref.
>
> I believe I outlined their use later.  Basically if you give the server
> free reign to do what it wants with the updates you send it, then you
> need a way for the client to be able to map the result back to what it
> requested.  Since now i could push to "master" but instead of updating
> master the server creates a refs/changes/1 ref and puts my changes there
> instead of updating master.  The client needs to know that the ref
> update it requested to master is what caused the creation of the
> refs/changes/1 ref.

understood, the question was more related to how you envision what
the client/server SHOULD be doing here, (and I think a one txn_id per
ref is what SHOULD be done is how this is best to implement the
thoughts above, also the client is ALLOWED to put many refs in one
txn, or would we just disallow that already at this stage to not confuse
the server?)

>
> >
> > Are new capabilities attached to ref updates or transactions?
> > Unlike the example above, stating "atomic" on each line, you could just
> > say "transaction 1 should be atomic" in another line, that would address
> > all refs in that transaction.
>
> I haven't thought through "atomic" so i have no idea what you'd want
> that to look like.

Yeah I have not really thought about them either, I just see two ways:
(A) adding more keywords in each ref-update (like "force") or
(B) adding new subsections somewhere where we talk about the capabilities
  instead.

Depending on why way we want to go this might have impact on the
design how to write the code.

> > > +       * Normal ref-updates require that the old value of a ref is supplied so
> > > +         that the server can verify that the reference that is being updated
> > > +         hasn't changed while the request was being processed.
> >
> > create/delete assume <00..00> for either old or new ? (We could also
> > omit the second hash for create delete, which is more appealing to me)
>
> Well that depends, in the case of a create you want to ensure that no
> ref with that name exists and would want it to fail if one already
> existed.  If you want to force it then you don't care if one existed or
> not, you just want the ref to have a certain value.

What I was trying to say is to have

    update = txn_id SP (modifier SP) action
    modifier = "force" | "atomic"
    action = (create | delete | update)
    create = "create" SP <hash>
    update = "update" SP <hash> SP <hash>
    delete = "delete" SP <hash>

i.e. only one hash for the create and delete action.
(I added the "atomic" modifier to demonstrate (A) from above, not needed here)

> >
> > > +       * Forced ref-updates only include the new value of a ref as we don't
> > > +         care what the old value was.
> >
> > How are you implementing force-with-lease then?
>
> Currently force-with-lease/force is implemented 100% on the client side,

Uh? That would be bad. Reading 631b5ef219c (push --force-with-lease: tie
it all together, 2013-07-08) I think that send-pack is done server-side?

> this proposal extends these two to be implemented on the server as well.
> non-forced variant are basically the "with-lease" case and "force" now
> actually forces an update.

I think we have 3 modes:
(1) standard push, where both client and server check for a fast-forward
(2) "force" that blindly overwrites the ref, but as that has a race condition
    in case multiple people can push to the remove we have
(3) "with-lease", disables the fast forward check both on client and server
    but still takes out a lock on the server to ensure no races happen

Now you propose to have only 2, making (1) and (3) the same, deferring
the check to have "fast forwards only" to be client only?
The server surely wants to ensure that, too (maybe you need
special permission for non-ff; depends on the server implementation).

I am not sure I like it, as on the protocol level this indeed looks the same
and the server and client need to care in their implementation how/when
the ff-check is done. Though it would be nice for the client UX that you
need to give a flag to check for ff (both client and server side? or can we rely
on the client alone then?)

> > > +             (delim-pkt | flush-pkt)
> > > +
> > > +    unpack-error = PKT-LINE("ERR" SP error-msg LF)
> > > +
> > > +    ref-update-status = *(update-result | update-error)
> > > +    update-result = *PKT-LINE(txn_id SP result LF)
> > > +    result = ("created" | "deleted" | "updated") SP refname SP old_oid SP new_oid
> > > +    update-error = PKT-LINE(txn_id SP "error" SP error-msg LF)
> >
> > Can we unify "ERR" and "error" ?
>
> No, these are very different.  You could have one ref update succeed
> while another doesn't for some reason, unless you want everything to be
> atomic.

I did not mean to unify them on the semantic level, but on the
representation level, i.e. have both of them spelled the same,
as they can still be differentiated by the leading txn id?


Thanks,
Stefan

P.S.: another feature that just came to mind is localisation of error messages.
But that is also easy to do with capabilities (the client sends a capability
such as "preferred-i18n=DE" and the server may translate all its errors
if it can.

That brings me to another point: do we assume all errors to be read
by humans? or do we want some markup things in there, too, similar to
EAGAIN?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 18:07     ` Stefan Beller
@ 2018-07-18 18:17       ` Duy Nguyen
  2018-07-18 18:21       ` Brandon Williams
  1 sibling, 0 replies; 19+ messages in thread
From: Duy Nguyen @ 2018-07-18 18:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Brandon Williams, Git Mailing List

On Wed, Jul 18, 2018 at 8:08 PM Stefan Beller <sbeller@google.com> wrote:
> P.S.: another feature that just came to mind is localisation of error messages.
> But that is also easy to do with capabilities (the client sends a capability
> such as "preferred-i18n=DE" and the server may translate all its errors
> if it can.
>
> That brings me to another point: do we assume all errors to be read
> by humans? or do we want some markup things in there, too, similar to
> EAGAIN?

We do have several classes of errors (fatal, error, warning) so at
least some machine-friendly code should be there. Perhaps just follow
many protocols out there (http in particular) and separate error codes
in groups. Then we can standardize some specific error codes later if
we want too but by default, error(), warning() or die() when running
in server context will have a fixed error code in each group.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 18:07     ` Stefan Beller
  2018-07-18 18:17       ` Duy Nguyen
@ 2018-07-18 18:21       ` Brandon Williams
  1 sibling, 0 replies; 19+ messages in thread
From: Brandon Williams @ 2018-07-18 18:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 07/18, Stefan Beller wrote:
> > > Given the example above for "rebase-on-push" though
> > > it is better to first send the packfile (as that is assumed to
> > > take longer) and then send the ref updates, such that the
> > > rebasing could be faster and has no bottleneck.
> >
> > I don't really follow this logic.  I don't think it would change
> > anything much considering the ref-updates section would usually be
> > much smaller than the packfile itself, course I don't have any data so
> > idk.
> 
> The server would need to serialize all incoming requests and apply
> them in order. The receiving of the packfile and the response to the client
> are not part of the critical section that needs to happen serialized but
> can be spread out to threads. So for that use case it would make
> sense to allow sending the packfile first.

I would think that a server needs to read the whole request before any
actions can be done, but maybe I need to think about this a bit more.

> 
> > > > +    update = txn_id SP action SP refname SP old_oid SP new_oid
> > > > +    force-update = txn_id SP "force" SP action SP refname SP new_oid
> > >
> > > So we insert "force" after the transaction id if we want to force it.
> > > When adding the atomic capability later we could imagine another insert here
> > >
> > >   1 atomic create refs/heads/new-ref <0-hash> <hash>
> > >   1 atomic delete refs/heads/old-ref <hash> <0-hash>
> > >
> > > which would look like a "rename" that we could also add instead.
> > > The transaction numbers are an interesting concept, how do you
> > > envision them to be used? In the example I put them both in the same
> > > transaction to demonstrate the "atomic-ness", but one could also
> > > imagine different transactions numbers per ref (i.e. exactly one
> > > ref per txn_id) to have a better understanding of what the server did
> > > to each individual ref.
> >
> > I believe I outlined their use later.  Basically if you give the server
> > free reign to do what it wants with the updates you send it, then you
> > need a way for the client to be able to map the result back to what it
> > requested.  Since now i could push to "master" but instead of updating
> > master the server creates a refs/changes/1 ref and puts my changes there
> > instead of updating master.  The client needs to know that the ref
> > update it requested to master is what caused the creation of the
> > refs/changes/1 ref.
> 
> understood, the question was more related to how you envision what
> the client/server SHOULD be doing here, (and I think a one txn_id per
> ref is what SHOULD be done is how this is best to implement the
> thoughts above, also the client is ALLOWED to put many refs in one
> txn, or would we just disallow that already at this stage to not confuse
> the server?)

Oh sorry for misunderstanding.  Yeah, as of right now I think having a
one-to-one relationship makes it easier to write/implement but I don't
know if there are other workflows which would benefit from multiple per
txn_id.

> 
> >
> > >
> > > Are new capabilities attached to ref updates or transactions?
> > > Unlike the example above, stating "atomic" on each line, you could just
> > > say "transaction 1 should be atomic" in another line, that would address
> > > all refs in that transaction.
> >
> > I haven't thought through "atomic" so i have no idea what you'd want
> > that to look like.
> 
> Yeah I have not really thought about them either, I just see two ways:
> (A) adding more keywords in each ref-update (like "force") or
> (B) adding new subsections somewhere where we talk about the capabilities
>   instead.
> 
> Depending on why way we want to go this might have impact on the
> design how to write the code.
> 
> > > > +       * Normal ref-updates require that the old value of a ref is supplied so
> > > > +         that the server can verify that the reference that is being updated
> > > > +         hasn't changed while the request was being processed.
> > >
> > > create/delete assume <00..00> for either old or new ? (We could also
> > > omit the second hash for create delete, which is more appealing to me)
> >
> > Well that depends, in the case of a create you want to ensure that no
> > ref with that name exists and would want it to fail if one already
> > existed.  If you want to force it then you don't care if one existed or
> > not, you just want the ref to have a certain value.
> 
> What I was trying to say is to have
> 
>     update = txn_id SP (modifier SP) action
>     modifier = "force" | "atomic"
>     action = (create | delete | update)
>     create = "create" SP <hash>
>     update = "update" SP <hash> SP <hash>
>     delete = "delete" SP <hash>
> 
> i.e. only one hash for the create and delete action.
> (I added the "atomic" modifier to demonstrate (A) from above, not needed here)

I understood what you were asking, I was just pointing out the rationale
for including the zero-id but i guess you're correct in that simply
omitting it would work just as well for what I outlined above.  So I
think I'll go with what you suggested here.

> 
> > >
> > > > +       * Forced ref-updates only include the new value of a ref as we don't
> > > > +         care what the old value was.
> > >
> > > How are you implementing force-with-lease then?
> >
> > Currently force-with-lease/force is implemented 100% on the client side,
> 
> Uh? That would be bad. Reading 631b5ef219c (push --force-with-lease: tie
> it all together, 2013-07-08) I think that send-pack is done server-side?

send-pack.c is the client side of push while receive-pack is the
server side.  There currently doesn't exist a way for a server to
understand the difference between a force/force-with-lease/non-force
push.

> 
> > this proposal extends these two to be implemented on the server as well.
> > non-forced variant are basically the "with-lease" case and "force" now
> > actually forces an update.
> 
> I think we have 3 modes:
> (1) standard push, where both client and server check for a fast-forward
> (2) "force" that blindly overwrites the ref, but as that has a race condition
>     in case multiple people can push to the remove we have
> (3) "with-lease", disables the fast forward check both on client and server
>     but still takes out a lock on the server to ensure no races happen
> 
> Now you propose to have only 2, making (1) and (3) the same, deferring
> the check to have "fast forwards only" to be client only?
> The server surely wants to ensure that, too (maybe you need
> special permission for non-ff; depends on the server implementation).
> 
> I am not sure I like it, as on the protocol level this indeed looks the same
> and the server and client need to care in their implementation how/when
> the ff-check is done. Though it would be nice for the client UX that you
> need to give a flag to check for ff (both client and server side? or can we rely
> on the client alone then?)

IIRC there are no checks done server-side for force, with-lease, or
fast-forward (well fast-forward can be checked for but this is a config
and has nothing to do with the protocol).  Most of the work is done by
the client to ensure these checks are made.

> 
> > > > +             (delim-pkt | flush-pkt)
> > > > +
> > > > +    unpack-error = PKT-LINE("ERR" SP error-msg LF)
> > > > +
> > > > +    ref-update-status = *(update-result | update-error)
> > > > +    update-result = *PKT-LINE(txn_id SP result LF)
> > > > +    result = ("created" | "deleted" | "updated") SP refname SP old_oid SP new_oid
> > > > +    update-error = PKT-LINE(txn_id SP "error" SP error-msg LF)
> > >
> > > Can we unify "ERR" and "error" ?
> >
> > No, these are very different.  You could have one ref update succeed
> > while another doesn't for some reason, unless you want everything to be
> > atomic.
> 
> I did not mean to unify them on the semantic level, but on the
> representation level, i.e. have both of them spelled the same,
> as they can still be differentiated by the leading txn id?

Oh I misunderstood again :) yeas we could standardize on ERR.

> 
> 
> Thanks,
> Stefan
> 
> P.S.: another feature that just came to mind is localisation of error messages.
> But that is also easy to do with capabilities (the client sends a capability
> such as "preferred-i18n=DE" and the server may translate all its errors
> if it can.
> 
> That brings me to another point: do we assume all errors to be read
> by humans? or do we want some markup things in there, too, similar to
> EAGAIN?

This sort of thing could be added as a protocol-level capability where
the client sends LANG=<some language> so that those sorts of msgs could
be translated server side before sending them.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-18 17:15       ` Brandon Williams
@ 2018-07-20 13:12         ` Jeff Hostetler
  2018-07-24 19:00           ` Brandon Williams
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2018-07-20 13:12 UTC (permalink / raw)
  To: Brandon Williams, Stefan Beller; +Cc: Derrick Stolee, git



On 7/18/2018 1:15 PM, Brandon Williams wrote:
> On 07/18, Stefan Beller wrote:
>> On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee <stolee@gmail.com> wrote:
>>>
>>> On 7/17/2018 7:25 PM, Stefan Beller wrote:
>>>> On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
>>>>> Signed-off-by: Brandon Williams <bmwill@google.com>
>>>>> ---
>>>>>
>>>>> Since introducing protocol v2 and enabling fetch I've been thinking
>>>>> about what its inverse 'push' would look like.  After talking with a
>>>>> number of people I have a longish list of things that could be done to
>>>>> improve push and I think I've been able to distill the core features we
>>>>> want in push v2.
>>>> It would be nice to know which things you want to improve.
>>>
>>> Hopefully we can also get others to chime in with things they don't like
>>> about the existing protocol. What pain points exist, and what can we do
>>> to improve at the transport layer before considering new functionality?
>>
>> Another thing that I realized last night was the possibility to chunk requests.
>> The web of today is driven by lots of small http(s) requests. I know our server
>> team fights with the internal tools all the time because the communication
>> involved in git-fetch is usually a large http request (large packfile).
>> So it would be nice to have the possibility of chunking the request.
>> But I think that can be added as a capability? (Not sure how)
> 
> Fetch and push requests/responses are already "chunked" when using the
> http transport.  So I'm not sure what you mean by adding a capability
> because the protocol doesn't care about which transport you're using.
> This is of course unless you're talking about a different "chunking"
> from what it means to chunk an http request/response.
> 

Internally, we've talked about wanting to have resumable pushes and
fetches.  I realize this is difficult to do when the server is
replicated and the repeated request might be talking to a different
server instance.  And there's a problem with temp files littering the
server as it waits for the repeated attempt.  But still, the packfile
sent/received can be large and connections do get dropped.

That is, if we think about sending 1 large packfile and just using a
byte-range-like approach to resuming the transfer.

If we allowed the request to send a series of packfiles, with each
"chunk" being self-contained and usable.  So if a push connection was
dropped the server could apply the successfully received packfile(s)
(add the received objects and update the refs to the commits received so
far).  And ignore the interrupted and unreceived packfile(s) and let the
client retry later.  When/if the client retried the push, it would
renegotiate haves/wants and send a new series of packfile(s).  With the
assumption being that the server would have updated refs from the
earlier aborted push, so the packfile(s) computed for the second attempt
would not repeat the content successfully transmitted in the first
attempt.

This would require that the client build an ordered set of packfiles
from oldest to newest so that the server can apply them in-order and
the graph remain connected.  That may be outside your scope here.

Also, we might have to add a few messages to the protocol after the
negotiation, for the client to say that it is going to send the push
content in 'n' packfiles and send 'n' messages with the intermediate
ref values being updated in each packfile.

Just thinking out loud here.
Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-20 13:12         ` Jeff Hostetler
@ 2018-07-24 19:00           ` Brandon Williams
  0 siblings, 0 replies; 19+ messages in thread
From: Brandon Williams @ 2018-07-24 19:00 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Stefan Beller, Derrick Stolee, git

On 07/20, Jeff Hostetler wrote:
> 
> 
> On 7/18/2018 1:15 PM, Brandon Williams wrote:
> > On 07/18, Stefan Beller wrote:
> > > On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee <stolee@gmail.com> wrote:
> > > > 
> > > > On 7/17/2018 7:25 PM, Stefan Beller wrote:
> > > > > On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams <bmwill@google.com> wrote:
> > > > > > Signed-off-by: Brandon Williams <bmwill@google.com>
> > > > > > ---
> > > > > > 
> > > > > > Since introducing protocol v2 and enabling fetch I've been thinking
> > > > > > about what its inverse 'push' would look like.  After talking with a
> > > > > > number of people I have a longish list of things that could be done to
> > > > > > improve push and I think I've been able to distill the core features we
> > > > > > want in push v2.
> > > > > It would be nice to know which things you want to improve.
> > > > 
> > > > Hopefully we can also get others to chime in with things they don't like
> > > > about the existing protocol. What pain points exist, and what can we do
> > > > to improve at the transport layer before considering new functionality?
> > > 
> > > Another thing that I realized last night was the possibility to chunk requests.
> > > The web of today is driven by lots of small http(s) requests. I know our server
> > > team fights with the internal tools all the time because the communication
> > > involved in git-fetch is usually a large http request (large packfile).
> > > So it would be nice to have the possibility of chunking the request.
> > > But I think that can be added as a capability? (Not sure how)
> > 
> > Fetch and push requests/responses are already "chunked" when using the
> > http transport.  So I'm not sure what you mean by adding a capability
> > because the protocol doesn't care about which transport you're using.
> > This is of course unless you're talking about a different "chunking"
> > from what it means to chunk an http request/response.
> > 
> 
> Internally, we've talked about wanting to have resumable pushes and
> fetches.  I realize this is difficult to do when the server is
> replicated and the repeated request might be talking to a different
> server instance.  And there's a problem with temp files littering the
> server as it waits for the repeated attempt.  But still, the packfile
> sent/received can be large and connections do get dropped.
> 
> That is, if we think about sending 1 large packfile and just using a
> byte-range-like approach to resuming the transfer.
> 
> If we allowed the request to send a series of packfiles, with each
> "chunk" being self-contained and usable.  So if a push connection was
> dropped the server could apply the successfully received packfile(s)
> (add the received objects and update the refs to the commits received so
> far).  And ignore the interrupted and unreceived packfile(s) and let the
> client retry later.  When/if the client retried the push, it would
> renegotiate haves/wants and send a new series of packfile(s).  With the
> assumption being that the server would have updated refs from the
> earlier aborted push, so the packfile(s) computed for the second attempt
> would not repeat the content successfully transmitted in the first
> attempt.
> 
> This would require that the client build an ordered set of packfiles
> from oldest to newest so that the server can apply them in-order and
> the graph remain connected.  That may be outside your scope here.
> 
> Also, we might have to add a few messages to the protocol after the
> negotiation, for the client to say that it is going to send the push
> content in 'n' packfiles and send 'n' messages with the intermediate
> ref values being updated in each packfile.
> 
> Just thinking out loud here.
> Jeff

We've talked about working on resumable fetch/push (both of which are
out of the scope of this work), but we haven't started working on
anything just yet.

There's a couple different ways to do this like you've pointed out, we
can either have the server redirect the client to fetch from a CDN
(where its put the packfile) and then the client can use ranged requests
to fetch until the server decides to remove it from the CDN.  This can
be tricky because every fetch can produce a unique packfile so maybe you
don't want to put a freshly constructed, unique packfile for each client
request up on a CDN somewhere.

Breaking up a response into multiple packfiles and small ref-updates
could work, that way as long as some of the smaller packs/updates are
applied then the client is making headway towards being up to date with
the server.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-17 21:09 [RFC] push: add documentation on push v2 Brandon Williams
  2018-07-17 23:25 ` Stefan Beller
@ 2018-07-24 19:28 ` Brandon Williams
  2018-07-25 15:15   ` Duy Nguyen
  1 sibling, 1 reply; 19+ messages in thread
From: Brandon Williams @ 2018-07-24 19:28 UTC (permalink / raw)
  To: git

On 07/17, Brandon Williams wrote:
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
> 
> Since introducing protocol v2 and enabling fetch I've been thinking
> about what its inverse 'push' would look like.  After talking with a
> number of people I have a longish list of things that could be done to
> improve push and I think I've been able to distill the core features we
> want in push v2.  Thankfully (due to the capability system) most of the
> other features/improvements can be added later with ease.
> 
> What I've got now is a rough design for a more flexible push, more
> flexible because it allows for the server to do what it wants with the
> refs that are pushed and has the ability to communicate back what was
> done to the client.  The main motivation for this is to work around
> issues when working with Gerrit and other code-review systems where you
> need to have Change-Ids in the commit messages (now the server can just
> insert them for you and send back new commits) and you need to push to
> magic refs to get around various limitations (now a Gerrit server should
> be able to communicate that pushing to 'master' doesn't update master
> but instead creates a refs/changes/<id> ref).
> 
> Before actually moving to write any code I'm hoping to get some feedback
> on if we think this is an acceptable base design for push (other
> features like atomic-push, signed-push, etc can be added as
> capabilities), so any comments are appreciated.
> 
>  Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)

Pinging this thread again to hopefully reach some more people for
commentary.  Looking back through the comments so far there are concerns
that a server shouldn't be trusted rewriting my local changes, so to
address that we could have the be a config option which is defaulted to
not take changes from a server.

Apart from that I didn't see any other major concerns.  I'm hoping to
get a bit more discussion going before actually beginning work on this.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-24 19:28 ` Brandon Williams
@ 2018-07-25 15:15   ` Duy Nguyen
  2018-07-25 17:46     ` Brandon Williams
  0 siblings, 1 reply; 19+ messages in thread
From: Duy Nguyen @ 2018-07-25 15:15 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Git Mailing List

On Tue, Jul 24, 2018 at 9:29 PM Brandon Williams <bmwill@google.com> wrote:
>
> On 07/17, Brandon Williams wrote:
> > Signed-off-by: Brandon Williams <bmwill@google.com>
> > ---
> >
> > Since introducing protocol v2 and enabling fetch I've been thinking
> > about what its inverse 'push' would look like.  After talking with a
> > number of people I have a longish list of things that could be done to
> > improve push and I think I've been able to distill the core features we
> > want in push v2.  Thankfully (due to the capability system) most of the
> > other features/improvements can be added later with ease.
> >
> > What I've got now is a rough design for a more flexible push, more
> > flexible because it allows for the server to do what it wants with the
> > refs that are pushed and has the ability to communicate back what was
> > done to the client.  The main motivation for this is to work around
> > issues when working with Gerrit and other code-review systems where you
> > need to have Change-Ids in the commit messages (now the server can just
> > insert them for you and send back new commits) and you need to push to
> > magic refs to get around various limitations (now a Gerrit server should
> > be able to communicate that pushing to 'master' doesn't update master
> > but instead creates a refs/changes/<id> ref).
> >
> > Before actually moving to write any code I'm hoping to get some feedback
> > on if we think this is an acceptable base design for push (other
> > features like atomic-push, signed-push, etc can be added as
> > capabilities), so any comments are appreciated.
> >
> >  Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
> >  1 file changed, 76 insertions(+)
>
> Pinging this thread again to hopefully reach some more people for
> commentary.

Could you send a v2 that covers all the push features in pack version
1? I see some are discussed but it's probably good to summarize in
this document too.

A few other comments

If I remember correctly, we always update the remote refs locally
after a push, assuming that the next 'fetch' will do the same anyway.
This is not true for special refs (like those from gerrit). Looking
from this I don't think it can say "yes we have received your pack and
stored it "somewhere" but there's no visible ref created for it" so
that we can skip the local remote ref update?

Is it simpler to tell a push client at the end that "yes there's new
stuff now on the server, do another fetch", sort of like HTTP
redirect, then the client can switch to fetch protocol to get the new
stuff that the server has created (e.g. rebase stuff)? I assume we
could reuse the same connection for both push and fetch if needed.
This way both fetch and push send packs in just one direction.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-25 15:15   ` Duy Nguyen
@ 2018-07-25 17:46     ` Brandon Williams
  2018-08-02 15:17       ` Duy Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Brandon Williams @ 2018-07-25 17:46 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On 07/25, Duy Nguyen wrote:
> On Tue, Jul 24, 2018 at 9:29 PM Brandon Williams <bmwill@google.com> wrote:
> >
> > On 07/17, Brandon Williams wrote:
> > > Signed-off-by: Brandon Williams <bmwill@google.com>
> > > ---
> > >
> > > Since introducing protocol v2 and enabling fetch I've been thinking
> > > about what its inverse 'push' would look like.  After talking with a
> > > number of people I have a longish list of things that could be done to
> > > improve push and I think I've been able to distill the core features we
> > > want in push v2.  Thankfully (due to the capability system) most of the
> > > other features/improvements can be added later with ease.
> > >
> > > What I've got now is a rough design for a more flexible push, more
> > > flexible because it allows for the server to do what it wants with the
> > > refs that are pushed and has the ability to communicate back what was
> > > done to the client.  The main motivation for this is to work around
> > > issues when working with Gerrit and other code-review systems where you
> > > need to have Change-Ids in the commit messages (now the server can just
> > > insert them for you and send back new commits) and you need to push to
> > > magic refs to get around various limitations (now a Gerrit server should
> > > be able to communicate that pushing to 'master' doesn't update master
> > > but instead creates a refs/changes/<id> ref).
> > >
> > > Before actually moving to write any code I'm hoping to get some feedback
> > > on if we think this is an acceptable base design for push (other
> > > features like atomic-push, signed-push, etc can be added as
> > > capabilities), so any comments are appreciated.
> > >
> > >  Documentation/technical/protocol-v2.txt | 76 +++++++++++++++++++++++++
> > >  1 file changed, 76 insertions(+)
> >
> > Pinging this thread again to hopefully reach some more people for
> > commentary.
> 
> Could you send a v2 that covers all the push features in pack version
> 1? I see some are discussed but it's probably good to summarize in
> this document too.

I can mention the ones we want to implement, but I expect that a push v2
would not require that all features in the current push are supported
out of the box.  Some servers may not want to support signed-push, etc.
Also I don't want to have to implement every single feature that exists
before getting something merged.  This way follow on series can be
written to implement those as new features to push v2.

> 
> A few other comments
> 
> If I remember correctly, we always update the remote refs locally
> after a push, assuming that the next 'fetch' will do the same anyway.
> This is not true for special refs (like those from gerrit). Looking
> from this I don't think it can say "yes we have received your pack and
> stored it "somewhere" but there's no visible ref created for it" so
> that we can skip the local remote ref update?

This is one of the pain points for gerrit and one of the reasons why
they have this funky push syntax "push origin HEAD:refs/for/master".
Because its not a remote tracking branch for the current branch, we
don't update (or create) a local branch "for/master" under the
"refs/remotes/origin" namespace (at least that's how i understand it).

So in order to support the server doing more things (rebasing, or
creating new branches) based on what is pushed the status that the
server sends in response needs to be more fluid so that the server can
describe what it did in a way that the client can either: update the
remote tracking branches, or not (and in this case maybe do what you
suggest down below).

> 
> Is it simpler to tell a push client at the end that "yes there's new
> stuff now on the server, do another fetch", sort of like HTTP
> redirect, then the client can switch to fetch protocol to get the new
> stuff that the server has created (e.g. rebase stuff)? I assume we
> could reuse the same connection for both push and fetch if needed.
> This way both fetch and push send packs in just one direction.

I really, really like this suggestion.  Thank you for your input.  This
would actually make the protocol much simpler by keeping push for
pushing packs and fetch for fetching packs.  And since my plan is to
have the status-report for push include all the relevant ref changes
that the server made, if there are any that don't correspond with what
was pushed (either the server rebased a change or created a new ref I
didn't) then we can skip the ref-advertisement and go straight to
fetching those refs.  And yes, since protocol v2 is command based we
could reuse the existing connection and simply send a "fetch" request
after our "push" one.

This does mean that it'll be an extra round trip than what I originally
had in mind, but it should be simpler.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] push: add documentation on push v2
  2018-07-25 17:46     ` Brandon Williams
@ 2018-08-02 15:17       ` Duy Nguyen
  0 siblings, 0 replies; 19+ messages in thread
From: Duy Nguyen @ 2018-08-02 15:17 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Git Mailing List

On Wed, Jul 25, 2018 at 7:46 PM Brandon Williams <bmwill@google.com> wrote:
> > Could you send a v2 that covers all the push features in pack version
> > 1? I see some are discussed but it's probably good to summarize in
> > this document too.
>
> I can mention the ones we want to implement, but I expect that a push v2
> would not require that all features in the current push are supported
> out of the box.  Some servers may not want to support signed-push, etc.
> Also I don't want to have to implement every single feature that exists
> before getting something merged.  This way follow on series can be
> written to implement those as new features to push v2.

I thought I wrote this mail but apparently I did not for some unknown
reason. My concern here is painting ourselves in the corner and having
a rough sketch of how current features will be implemented or replaced
in v2 would help reduce that risk.
-- 
Duy

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-08-02 15:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17 21:09 [RFC] push: add documentation on push v2 Brandon Williams
2018-07-17 23:25 ` Stefan Beller
2018-07-18 13:31   ` Derrick Stolee
2018-07-18 16:56     ` Stefan Beller
2018-07-18 17:15       ` Brandon Williams
2018-07-20 13:12         ` Jeff Hostetler
2018-07-24 19:00           ` Brandon Williams
2018-07-18 17:11     ` Brandon Williams
2018-07-18 17:19       ` Duy Nguyen
2018-07-18 17:46         ` Brandon Williams
2018-07-18 17:57           ` Duy Nguyen
2018-07-18 17:08   ` Brandon Williams
2018-07-18 18:07     ` Stefan Beller
2018-07-18 18:17       ` Duy Nguyen
2018-07-18 18:21       ` Brandon Williams
2018-07-24 19:28 ` Brandon Williams
2018-07-25 15:15   ` Duy Nguyen
2018-07-25 17:46     ` Brandon Williams
2018-08-02 15:17       ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).