git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Scott Chacon <schacon@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	git@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Request for detailed documentation of git pack protocol
Date: Thu, 4 Jun 2009 22:55:43 +0200	[thread overview]
Message-ID: <200906042255.43952.jnareb@gmail.com> (raw)
In-Reply-To: <200906022339.08639.jnareb@gmail.com>

This is combined response to various messages in this thread, following
my discoveries done using simple Perl script (using IO::Socket) which
assumes role of a git client, tested against github.com (IIRC it uses
Ruby implementation) and git.kernel.org (C Git), and "nc -l 9418".

By the way, is there some publicly accessible JGit (Java) and Dulwich
(Python) git-daemon one can test against?

  sp = Shawn O. Pearce
  jn = Jakub Narebski
  gb = Git Community Book (http://book.git-scm.com)


jn>> I meant that in the request line for fetching via git:// protocol
jn>>
jn>>       0032git-upload-pack /project.git\\000host=myserver.com\\000
jn>>
jn>> you separate path to repository from extra options using "\0" / NUL
jn>> as a separator. Well, this is only sane separator, as it is path
jn>> terminator, the only character which cannot appear in pathname
jn>> (although I do wonder whether project names with e.g. control
jn>> characters or UTF-8 characters would work correctly).
sp>
sp> No, that isn't the reason '\0' is used here.  But yea, that is true.
sp>
sp> The reason \0 is used is, git-daemon reads the 4 byte length, decodes
sp> that, then reads that many bytes.  Finally it writes a '\0' at the
sp> end of what it read, so that the entire "line" is NUL terminated.
sp> Then it reads the "command path" part from the resulting C string.
sp>
sp> The host=myserver.com part came later, after many daemons were
sp> already running all over the world.  By hiding it behind the '\0'
sp> an old daemon would never see it (but strlen() returned a value that
sp> was less than the length read, but the old daemons didn't care).
sp> Newer daemons look for where strlen() < length, and assume that
sp> the host header follows.
sp>
sp> The host header ends with '\0' in case additional headers would
sp> also appear here in the future.  IOW, like HTTP allows new headers
sp> to be added before the "\r\n\r\n" terminator at the body, we allow
sp> them between "\0".
[...]

sp> The NUL at the end of the host name is not strictly required, but
sp> must be present if the client were to ever pass additional options
sp> to the server.

Actually both git.kernel.org and github.com failed (deadlocked / hung)
when I tried to add extra key=value parameter at the end of request:

  003bgit-upload-pack /project.git\0host=myserver.com\0user=me\0

Hmmmm...


jn>> Hmmm... the communication between server and client is not entirely
jn>> clean. Do I understand correctly that this NAK is response to
jn>> clients flush after all those "want" lines?
sp>
sp> Yes.
sp>
jn>> And that "0009done" from client
jn>> tells server that it should send everything it has?
sp>
sp> Yes.  It means the client will not issue any more "have" lines,
sp> as it has nothing further in its history, so the server just has
sp> to give up and start generating a pack based on what it knows.

Here we were talking about the following part of exchange: 
(I have added "C:" prefix to signal that this is what client, 
git-clone here, sends; I have added also explicit "\n" to mark LF
characters terminating lines, and put each pkt-line on separate line)

gb>  C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack side-band-64k ofs-delta\n
gb>  C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
gb>  C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
gb>  C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
gb>  C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
gb>  C: 0000
gb>  C: 0009done\n

and where server response is (again the quote from "Git Community Book"
was modified, removing here doublequotes and doubling of backslashes):

gb>  S: 0008NAK\n
gb>  S: 0023\002Counting objects: 2797, done.\n
gb>  [...]
gb>  S: 2004\001PACK\000\000\000\002 [...]

I have thought that after sending "0000" flush line client can wait for
NAK or ACK server response... but it is not the case.  When I tried to
read from server after "0000" flush and before "0009done\n", my client
(or netcat instance) deadlocked (hung) waiting for server response.
I either did a mistake in my fake client, or I don't understand git pack
protocol correctly.  Should client wait for NAK or ACK from server _only_
after sending maximum number of want/have lines (256 if I remember 
correctly?)?

When I removed sending "0000" flush line my fake client again hung 
(deadlocked?) waiting for server.


jn>> P.S. By the way, is pkt-line format original invention, or was it 
jn>> 'borrowed' from some other standard or protocol?
sp>
sp> No clue.  I find it f'king odd that the length is in hex.  There
sp> isn't much value to the protocol being human readable.  The PACK
sp> part of the stream sure as hell ain't.  You aren't going to type
sp> out a sequence of "have" lines against the remote, like you could
sp> with say an HTTP GET.  *shrug*

"git gui blame pkt-line.c" shows that pkt-line format is Linus invention.

It looks quite a bit like 'chunked' transfer encoding[1] in HTTP; there
each non-empty chunk starts with the number of octets of the data it
embeds (size written in hexadecimal) followed by a CRLF (carriage return
and linefeed), and the data itself. The chunk is then closed with a CRLF.
In some implementations, white space chars (0x20) are padded between
chunk-size and the CRLF.  In pkt-line format number of octet has fixed
width (4 hexadecimal digits, 0-padded), and we do not use CRLF as 
terminator of chunk/packet length and of chunk/packet itself.

In HTTP 'chunked' transfer encoding the last chunk is a single line,
simply made of the chunk-size (0).  In pkt-line format we use special
size of "0000" for a flush packet.

[1] http://en.wikipedia.org/wiki/Chunked_transfer_encoding

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2009-06-04 20:56 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-12 21:29 Request for detailed documentation of git pack protocol Jakub Narebski
2009-05-12 23:34 ` Shawn O. Pearce
2009-05-14  8:24   ` Jakub Narebski
2009-05-14 14:57     ` Shawn O. Pearce
2009-05-14 15:02       ` Andreas Ericsson
2009-05-15 20:29         ` Linus Torvalds
2009-05-15 16:51       ` Clemens Buchacher
2009-05-14 18:13     ` Nicolas Pitre
2009-05-14 20:27       ` Jakub Narebski
2009-05-14 13:55   ` Scott Chacon
2009-05-14 14:44     ` Shawn O. Pearce
2009-05-14 15:01     ` Jakub Narebski
2009-05-15  0:58       ` A Large Angry SCM
2009-05-15 19:05         ` Ealdwulf Wuffinga
2009-06-02 21:39     ` Jakub Narebski
2009-06-02 23:27       ` Shawn O. Pearce
2009-06-03  0:50         ` Jakub Narebski
2009-06-03  1:29           ` Shawn O. Pearce
2009-06-03  2:11             ` Junio C Hamano
2009-06-03  2:15               ` Shawn O. Pearce
2009-06-03  9:21             ` Jakub Narebski
2009-06-03 14:48               ` Shawn O. Pearce
2009-06-03 15:07                 ` Shawn O. Pearce
2009-06-03 15:39                   ` Jakub Narebski
2009-06-03 15:50                     ` Shawn O. Pearce
2009-06-03 16:51                 ` Jakub Narebski
2009-06-03 16:56                   ` Shawn O. Pearce
2009-06-03 20:19                     ` Jakub Narebski
2009-06-03 20:24                       ` Shawn O. Pearce
2009-06-03 22:04                         ` Jakub Narebski
2009-06-03 22:04                           ` Shawn O. Pearce
2009-06-03 22:16                           ` Junio C Hamano
2009-06-03 22:46                             ` Jakub Narebski
2009-06-04  7:17                         ` Andreas Ericsson
2009-06-04  7:26                           ` Junio C Hamano
2009-06-06 16:33                     ` Scott Chacon
2009-06-06 17:24                       ` Junio C Hamano
2009-06-06 17:41                       ` Jakub Narebski
2009-06-03 21:38                   ` Tony Finch
2009-06-03 17:11                 ` Junio C Hamano
2009-06-03 19:05                 ` Johannes Sixt
2009-06-03  2:18           ` Robin H. Johnson
2009-06-03 10:47             ` Jakub Narebski
2009-06-03 14:17               ` Shawn O. Pearce
2009-06-03 20:56           ` Tony Finch
2009-06-03 21:20             ` Jakub Narebski
2009-06-03 21:53               ` Tony Finch
2009-06-04  8:45                 ` Jakub Narebski
2009-06-04 11:41                   ` Tony Finch
2009-06-04 18:41                   ` Shawn O. Pearce
2009-06-03 12:29       ` Jakub Narebski
2009-06-03 14:19         ` Shawn O. Pearce
2009-06-04 20:55       ` Jakub Narebski [this message]
2009-06-04 21:57         ` Shawn O. Pearce
2009-06-05  0:45         ` Shawn O. Pearce
2009-06-05  7:24           ` Jakub Narebski
2009-06-05  8:45             ` Jakub Narebski
2009-06-06 21:38       ` Comments pack protocol description in "Git Community Book" (second round) Jakub Narebski
2009-06-06 21:58         ` Scott Chacon
2009-06-07  8:21           ` Jakub Narebski
2009-06-07 20:13             ` Shawn O. Pearce
2009-06-07 20:43           ` Shawn O. Pearce
2009-06-13  9:30           ` Comments pack protocol description in "RFC for the Git Packfile Protocol" (long) Jakub Narebski
2009-06-07 20:06         ` Comments pack protocol description in "Git Community Book" (second round) Shawn O. Pearce
2009-06-09  9:39           ` Jakub Narebski
2009-06-09 14:28             ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200906042255.43952.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=schacon@gmail.com \
    --cc=spearce@spearce.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).