git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Scott Chacon <schacon@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>, git@vger.kernel.org
Subject: Re: Request for detailed documentation of git pack protocol
Date: Tue, 2 Jun 2009 23:39:07 +0200	[thread overview]
Message-ID: <200906022339.08639.jnareb@gmail.com> (raw)
In-Reply-To: <d411cc4a0905140655y244f21aem44f1e246dd74d80c@mail.gmail.com>

On Thu, 14 May 2009, Scott Chacon wrote:
> On Tue, May 12, 2009 at 4:34 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
>> Jakub Narebski <jnareb@gmail.com> wrote:

>>> We have now proliferation of different (re)implementations of git:
>>> JGit in Java, Dulwich in Python, Grit in Ruby; and there are other
>>> planned: git# / managed git in C# (GSoC Mono project), ObjectiveGit
>>> in Objective-C (for iPhone IIRC).  At some time they would reach
>>> the point (or reached it already) of implementing git-daemon...
>>> but currently the documentation of git protocol is lacking.
[...]
 
> It seems like if anyone would do what you're asking, it's probably me.
> [...]  I'm also working with Shawn
> on the Apress book, where I was going to try to document much of this
> information, perhaps I could try writing an RFC as an appendix or
> something - then that will force him to spend time correcting
> everything I got wrong :)  At least that might be a good starting
> point - I'm unfamiliar with the actual RFC process, so I'll research
> that a bit today.  I don't mind writing it, I think it would be really
> really useful to have, I just am unfamiliar with the process.

[...]
>>> It would be really nice, I think, to have RFC for git pack protocol.
>>> And it would help avoid incompatibilities between different clients
>>> and servers.  If the document would contain expected behaviour of
>>> client and server and Best Current Practices it would help avoid
>>> pitfals when implementing git-daemon in other implementation.
>>
>> Yea, it would be nice.  But find me someone who knows the protocol
>> and who has the time to document the #!@* thing.  Maybe I'll try
>> to work on this myself, but I'm strapped for time, especially over
>> the next two-to-three months.

I see that there is (at least beginnings of) description of git pack
protocol in section "Transfer Protocols"[1][2] of chapter "7. Internals
and Plumbing" of "Git Community Book".

 [1] http://book.git-scm.com/7_transfer_protocols.html
 [2] http://github.com/schacon/gitbook/blob/master/text/54_Transfer_Protocols/0_Transfer_Protocols.markdown

Let me quote here relevant part of this chapter, with some comment I am
not sure validity of... and therefore I'd like to ask for comments here,
rather than sending a patch of pull request already


> ### Fetching Data with Upload Pack ###
>
> For the smarter protocols, fetching objects is much more efficient. 
> A socket is opened, either over ssh or over port 9418 (in the case of
> the git:// protocol), and the linkgit:git-fetch-pack[1] command on
> the client begins communicating with a forked
> linkgit:git-upload-pack[1] process on the server.

Is fetching over SSH exactly the same as fetching over git:// protocol?

>
> Then the server will tell the client which SHAs it has for each ref,
> and the client figures out what it needs and responds with a list of
> SHAs it wants and already has.
>
> At this point, the server will generate a packfile with all the
> objects that the client needs and begin streaming it down to the
> client.

We would want here probably the overview of client-server communication
as described in Documentation/technical/pack-protocol.txt

>
> Let's look at an example.
>
> The client connects and sends the request header. The clone command
>
> 	$ git clone git://myserver.com/project.git
>
> produces the following request:
>
> 	0032git-upload-pack /project.git\\000host=myserver.com\\000
>
> The first four bytes contain the hex length of the line (including 4
> byte line length and trailing newline if present). Following are the
> command and arguments. This is followed by a null byte and then the
> host information. The request is terminated by a null byte.

There is a question how to organize this information. Should we describe
pkt-line format upfront, e.g. using ABNF notation from RFC 5234 used in
RFC documents:

  <pkt-line>   = ( <pkt-length> <pkt-payload> [ LF ] ) / <pkt-flush>
  <pkt-length> = 4HEXDIGIT                  ; length of <pkt-line>
  <pkt-flush>  = "0000"

or something like that?


Sidenote: wouldn't it be better to use \0 (\\0 in source) for NUL
character rather than \000 (\\000 in source) octal representation?

>
> The request is processed and turned into a call to git-upload-pack:
>
>  	$ git-upload-pack /path/to/repos/project.git

Is it "git-upload-pack" or "git upload-pack" nowadays?

Additionally currently this chapter does not explain how request for
"/project.git" is turned into /path/to/repos/project.git path to
repository both in case of git-daemon (git:// protocol) and SSH.

>
> This immediately returns information of the repo:

To be more exact this is information about references (I guess this
is information about heads only, is it?), with information about
server capabilities stuffed in.

>
> 	007c74730d410fcb6603ace96f1dc55ea6196122532d HEAD\\000multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress
>       003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug
>       003d5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/dist
>       003e7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 refs/heads/local
>       003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master
>       0000 
>
> Each line starts with a four byte line length declaration in hex. The
> section is terminated by a line length declaration of 0000.

Should we describe here, or in appendix, or in sidenote, or in footnote
all currently supported client capabilities and server capabilities?

 * multi_ack (why not mult-ack?)
 * thin-pack 
 * side-band 
 * side-band-64k 
 * ofs-delta 
 * shallow 
 * no-progress

Is each line terminated by "\n" or "\0"? Is 'flush' line? This is not
clear from above description. From simple playing with nc (netcat) it
looks like each line with exception of '0000' is terminated with "\n".

>
> This is sent back to the client verbatim. The client responds with
> another request:
>
> 	0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack side-band-64k ofs-delta 
> 	0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe
> 	0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a
> 	0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01
> 	0032want 74730d410fcb6603ace96f1dc55ea6196122532d

The semantics (meaning) of those 'want' lines is not described here,
although one can easily guess that those are commits that client does
not have, and which do want. In the case of "git clone" those are all
unique sha1 that client got (what happend if server has detached HEAD?)

It is not clear, but one can guess that set of capabilities that client
sends (without stuffing behind NUL character this time?) is a supported
by client and wanted subset of server capabilities.

> 	00000009done

First I thought that this is an error... but not, the 'flush' ("0000")
is not LF terminated.

>
> The is sent to the open git-upload-pack process which then streams
> out the final response:

Hmmm... here it is used different notation than above; everything is
within quotes, and end-of-line character is explicitly stated this time.

>
> 	"0008NAK\n"

What does this server response mean? That served doesn't need more
info? Having overview of client-server communication upfront would help
here (there would be a point to refer to).

> 	"0023\\002Counting objects: 2797, done.\n"
> 	"002b\\002Compressing objects:   0% (1/1177)   \r"
> 	"002c\\002Compressing objects:   1% (12/1177)   \r"
> 	"002c\\002Compressing objects:   2% (24/1177)   \r"
> 	"002c\\002Compressing objects:   3% (36/1177)   \r"
> 	"002c\\002Compressing objects:   4% (48/1177)   \r"
> 	"002c\\002Compressing objects:   5% (59/1177)   \r"
> 	"002c\\002Compressing objects:   6% (71/1177)   \r"
> 	"0053\\002Compressing objects:   7% (83/1177)   \rCompressing objects:   8% (95/1177)   \r" ...
> 	"005b\\002Compressing objects: 100% (1177/1177)   \rCompressing objects: 100% (1177/1177), done.\n"

I guess that it is sideband support: after pkt-length there is number
of stream (multiplexing), where 2 = \002 means stderr.

I wonder why sometimes it is one line per update, and sometimes there
is more than one update info stuffed in single line.

>       "2004\\001PACK\\000\\000\\000\\002\\000\\000\n\\355\\225\\017x\\234\\235\\216K\n\\302"...
>       "2005\\001\\360\\204{\\225\\376\\330\\345]z\226\273"...
> 	...
> 	"0037\\002Total 2797 (delta 1799), reused 2360 (delta 1529)\n"
> 	...

I can guess that this is example of multiplexing at work. Here again
some kind of ABNF notation would be IMHO useful, e.g.

  <pkt-line-sideband> = <pkt-length> <sideband-channel> <pkt-payload> [ LF / CR ]
  <pkt-length-sideband> = 4HEXDIGIT   ; length of <pkt-line-sideband>
  <sideband-channel> = %d01-%d02

(Or something like that; I am not sure about ABNF details here).

> 	"<\\276\\255L\\273s\\005\\001w0006\\001[0000"

Hmmm... strange, this is not in pkt-line format...

>
> See the Packfile chapter previously for the actual format of the
> packfile data in the response.

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2009-06-02 21:39 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-12 21:29 Request for detailed documentation of git pack protocol Jakub Narebski
2009-05-12 23:34 ` Shawn O. Pearce
2009-05-14  8:24   ` Jakub Narebski
2009-05-14 14:57     ` Shawn O. Pearce
2009-05-14 15:02       ` Andreas Ericsson
2009-05-15 20:29         ` Linus Torvalds
2009-05-15 16:51       ` Clemens Buchacher
2009-05-14 18:13     ` Nicolas Pitre
2009-05-14 20:27       ` Jakub Narebski
2009-05-14 13:55   ` Scott Chacon
2009-05-14 14:44     ` Shawn O. Pearce
2009-05-14 15:01     ` Jakub Narebski
2009-05-15  0:58       ` A Large Angry SCM
2009-05-15 19:05         ` Ealdwulf Wuffinga
2009-06-02 21:39     ` Jakub Narebski [this message]
2009-06-02 23:27       ` Shawn O. Pearce
2009-06-03  0:50         ` Jakub Narebski
2009-06-03  1:29           ` Shawn O. Pearce
2009-06-03  2:11             ` Junio C Hamano
2009-06-03  2:15               ` Shawn O. Pearce
2009-06-03  9:21             ` Jakub Narebski
2009-06-03 14:48               ` Shawn O. Pearce
2009-06-03 15:07                 ` Shawn O. Pearce
2009-06-03 15:39                   ` Jakub Narebski
2009-06-03 15:50                     ` Shawn O. Pearce
2009-06-03 16:51                 ` Jakub Narebski
2009-06-03 16:56                   ` Shawn O. Pearce
2009-06-03 20:19                     ` Jakub Narebski
2009-06-03 20:24                       ` Shawn O. Pearce
2009-06-03 22:04                         ` Jakub Narebski
2009-06-03 22:04                           ` Shawn O. Pearce
2009-06-03 22:16                           ` Junio C Hamano
2009-06-03 22:46                             ` Jakub Narebski
2009-06-04  7:17                         ` Andreas Ericsson
2009-06-04  7:26                           ` Junio C Hamano
2009-06-06 16:33                     ` Scott Chacon
2009-06-06 17:24                       ` Junio C Hamano
2009-06-06 17:41                       ` Jakub Narebski
2009-06-03 21:38                   ` Tony Finch
2009-06-03 17:11                 ` Junio C Hamano
2009-06-03 19:05                 ` Johannes Sixt
2009-06-03  2:18           ` Robin H. Johnson
2009-06-03 10:47             ` Jakub Narebski
2009-06-03 14:17               ` Shawn O. Pearce
2009-06-03 20:56           ` Tony Finch
2009-06-03 21:20             ` Jakub Narebski
2009-06-03 21:53               ` Tony Finch
2009-06-04  8:45                 ` Jakub Narebski
2009-06-04 11:41                   ` Tony Finch
2009-06-04 18:41                   ` Shawn O. Pearce
2009-06-03 12:29       ` Jakub Narebski
2009-06-03 14:19         ` Shawn O. Pearce
2009-06-04 20:55       ` Jakub Narebski
2009-06-04 21:57         ` Shawn O. Pearce
2009-06-05  0:45         ` Shawn O. Pearce
2009-06-05  7:24           ` Jakub Narebski
2009-06-05  8:45             ` Jakub Narebski
2009-06-06 21:38       ` Comments pack protocol description in "Git Community Book" (second round) Jakub Narebski
2009-06-06 21:58         ` Scott Chacon
2009-06-07  8:21           ` Jakub Narebski
2009-06-07 20:13             ` Shawn O. Pearce
2009-06-07 20:43           ` Shawn O. Pearce
2009-06-13  9:30           ` Comments pack protocol description in "RFC for the Git Packfile Protocol" (long) Jakub Narebski
2009-06-07 20:06         ` Comments pack protocol description in "Git Community Book" (second round) Shawn O. Pearce
2009-06-09  9:39           ` Jakub Narebski
2009-06-09 14:28             ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200906022339.08639.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=schacon@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).