git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* How to determine when to stop receiving pack content
@ 2019-08-10 23:47 Farhan Khan
  2019-08-11 15:04 ` Pratyush Yadav
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Farhan Khan @ 2019-08-10 23:47 UTC (permalink / raw)
  To: git

Hi,

I am trying to write an implementation of git clone over ssh and am a little confused how to determine a server response has ended. Specifically, after a client sends its requested 'want', the server sends the pack content over. However, how does the client know to stop reading data? If I run a simple read() of the file descriptor:

A. If I use reading blocking, the client will wait until new data is available, potentially forever.
B. If I use non-blocking, the client might terminate reading for new data, when in reality new data is in transit.

I do not see a mechanism to specify the size or to indicate the end of the pack content. Am I missing something?

Thanks
---
Farhan Khan
PGP Fingerprint: 1312 89CE 663E 1EB2 179C  1C83 C41D 2281 F8DA C0DE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to determine when to stop receiving pack content
  2019-08-10 23:47 How to determine when to stop receiving pack content Farhan Khan
@ 2019-08-11 15:04 ` Pratyush Yadav
  2019-08-11 22:38 ` Junio C Hamano
  2019-08-11 23:31 ` Farhan Khan
  2 siblings, 0 replies; 5+ messages in thread
From: Pratyush Yadav @ 2019-08-11 15:04 UTC (permalink / raw)
  To: Farhan Khan; +Cc: Git

On 10/08/19 11:47PM, Farhan Khan wrote:
> Hi,
> 
> I am trying to write an implementation of git clone over ssh and am a little confused how to determine a server response has ended. Specifically, after a client sends its requested 'want', the server sends the pack content over. However, how does the client know to stop reading data? If I run a simple read() of the file descriptor:
>
> A. If I use reading blocking, the client will wait until new data is available, potentially forever.
> B. If I use non-blocking, the client might terminate reading for new data, when in reality new data is in transit.
> 
> I do not see a mechanism to specify the size or to indicate the end of the pack content. Am I missing something?

Well, I am not very familiar with git-clone internals, but I did some 
digging around, and I think I know what answer to your problem is.

Looking at Documentation/technical/protocol-v2.txt:34, the flush packet 
indicates the end of a message. Looking in the output section of the 
fetch command (protocol-v2.txt:342), it sends you some optional 
sections, and then the packfile and then sends a flush packet.

So your read should stop reading data when it sees the flush packet.

Another way would be to look at the packfile contents. Looking at 
Documentation/technical/pack-format.txt, the packfile contains the 
number of objects in the packfile, and each object entry has the object 
size. So you can stop reading after you have received the last object in 
the packfile (plus the 20-byte trailer).

I don't know which is the better way, but the former seems like a better 
choice to me.

-- 
Regards,
Pratyush Yadav

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to determine when to stop receiving pack content
  2019-08-10 23:47 How to determine when to stop receiving pack content Farhan Khan
  2019-08-11 15:04 ` Pratyush Yadav
@ 2019-08-11 22:38 ` Junio C Hamano
  2019-08-11 23:31 ` Farhan Khan
  2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2019-08-11 22:38 UTC (permalink / raw)
  To: Farhan Khan; +Cc: git

"Farhan Khan" <farhan@farhan.codes> writes:

> I am trying to write an implementation of git clone over ssh and
> am a little confused how to determine a server response has
> ended. Specifically, after a client sends its requested 'want',
> the server sends the pack content over. However, how does the
> client know to stop reading data? If I run a simple read() of the
> file descriptor:
>
> A. If I use reading blocking, the client will wait until new data is available, potentially forever.
> B. If I use non-blocking, the client might terminate reading for new data, when in reality new data is in transit.

It's TCP stream, so blocking read will tell you when the the other
side finishes talking to you and disconnects.  Your read() will
signal EOF.  If you are paranoid and want to protect your reader
against malicious writer, then you cannot trust anything the other
side says (including possibly any "I have N megabyte of data" kind
of length information), so you'd need to set up a timeout to get
yourself out of a stuck read, but that is neither a news nor a
rocket surgery ;-)

The "upload-pack" (the component that talks with your "fetch" and
"clone"), after negotiating what objects to include in the data
transfer with the program on your side, produces a pack data stream,
and is allowed to send additional "garbage" after that.

The receiving end, after finishing the negotiation, reads the pack
data stream (there is only one packfile contents in it) and parses
it according to the packfile format so that it can find the end
(cf. Documentation/technical/pack-format.txt).

After seeing the end of the pack stream, anything that follows is
"garbage" and is generally passed through to the standard output.

There are two codepaths on the receiving end ("unpack-objects" and
"index-pack --stdin").  Most likely an initial "clone" would end up
following the latter, but for educational purposes, the unpack-objects
may be easier to follow.  These two codepaths are morally equivalent
at the higher conceptual levels.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to determine when to stop receiving pack content
  2019-08-10 23:47 How to determine when to stop receiving pack content Farhan Khan
  2019-08-11 15:04 ` Pratyush Yadav
  2019-08-11 22:38 ` Junio C Hamano
@ 2019-08-11 23:31 ` Farhan Khan
  2019-08-12  0:22   ` Pratyush Yadav
  2 siblings, 1 reply; 5+ messages in thread
From: Farhan Khan @ 2019-08-11 23:31 UTC (permalink / raw)
  To: Pratyush Yadav; +Cc: Git

August 11, 2019 11:04 AM, "Pratyush Yadav" <me@yadavpratyush.com> wrote:

> On 10/08/19 11:47PM, Farhan Khan wrote:
> 
>> Hi,
>> 
>> I am trying to write an implementation of git clone over ssh and am a little confused how to
>> determine a server response has ended. Specifically, after a client sends its requested 'want', the
>> server sends the pack content over. However, how does the client know to stop reading data? If I
>> run a simple read() of the file descriptor:
>> 
>> A. If I use reading blocking, the client will wait until new data is available, potentially
>> forever.
>> B. If I use non-blocking, the client might terminate reading for new data, when in reality new data
>> is in transit.
>> 
>> I do not see a mechanism to specify the size or to indicate the end of the pack content. Am I
>> missing something?
> 
> Well, I am not very familiar with git-clone internals, but I did some
> digging around, and I think I know what answer to your problem is.
> 
> Looking at Documentation/technical/protocol-v2.txt:34, the flush packet
> indicates the end of a message. Looking in the output section of the
> fetch command (protocol-v2.txt:342), it sends you some optional
> sections, and then the packfile and then sends a flush packet.
> 
> So your read should stop reading data when it sees the flush packet.
> 
> Another way would be to look at the packfile contents. Looking at
> Documentation/technical/pack-format.txt, the packfile contains the
> number of objects in the packfile, and each object entry has the object
> size. So you can stop reading after you have received the last object in
> the packfile (plus the 20-byte trailer).
> 
> I don't know which is the better way, but the former seems like a better
> choice to me.
> 
> --
> Regards,
> Pratyush Yadav

Hi Pratyush,

Thanks for your reply!

Unless I am mistaken, a pack file does not end in a flush_pkt. I ran some tests and did not see the stream end in "0000". Is there is a mistake somewhere on my end?

---
Farhan Khan
PGP Fingerprint: 1312 89CE 663E 1EB2 179C  1C83 C41D 2281 F8DA C0DE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to determine when to stop receiving pack content
  2019-08-11 23:31 ` Farhan Khan
@ 2019-08-12  0:22   ` Pratyush Yadav
  0 siblings, 0 replies; 5+ messages in thread
From: Pratyush Yadav @ 2019-08-12  0:22 UTC (permalink / raw)
  To: Farhan Khan; +Cc: Git

On 11/08/19 11:31PM, Farhan Khan wrote:
> August 11, 2019 11:04 AM, "Pratyush Yadav" <me@yadavpratyush.com> wrote:
> 
> > On 10/08/19 11:47PM, Farhan Khan wrote:
> > 
> >> Hi,
> >> 
> >> I am trying to write an implementation of git clone over ssh and am a little confused how to
> >> determine a server response has ended. Specifically, after a client sends its requested 'want', the
> >> server sends the pack content over. However, how does the client know to stop reading data? If I
> >> run a simple read() of the file descriptor:
> >> 
> >> A. If I use reading blocking, the client will wait until new data is available, potentially
> >> forever.
> >> B. If I use non-blocking, the client might terminate reading for new data, when in reality new data
> >> is in transit.
> >> 
> >> I do not see a mechanism to specify the size or to indicate the end of the pack content. Am I
> >> missing something?
> > 
> > Well, I am not very familiar with git-clone internals, but I did some
> > digging around, and I think I know what answer to your problem is.
> > 
> > Looking at Documentation/technical/protocol-v2.txt:34, the flush packet
> > indicates the end of a message. Looking in the output section of the
> > fetch command (protocol-v2.txt:342), it sends you some optional
> > sections, and then the packfile and then sends a flush packet.
> > 
> > So your read should stop reading data when it sees the flush packet.
> > 
> > Another way would be to look at the packfile contents. Looking at
> > Documentation/technical/pack-format.txt, the packfile contains the
> > number of objects in the packfile, and each object entry has the object
> > size. So you can stop reading after you have received the last object in
> > the packfile (plus the 20-byte trailer).
> > 
> > I don't know which is the better way, but the former seems like a better
> > choice to me.
> > 
> > --
> > Regards,
> > Pratyush Yadav
> 
> Hi Pratyush,
> 
> Thanks for your reply!
> 
> Unless I am mistaken, a pack file does not end in a flush_pkt. I ran some tests and did not see the stream end in "0000". Is there is a mistake somewhere on my end?
 
Hm, turns out I was on the pu branch, not master when I looked at 
protocol-v2.txt. The file was updated about 3 days ago (not in master 
yet) (7ee4ab7e8c3310fc28d9dd7d47da26e497e73556), where it seems to imply 
that flush-pkt will be sent after the packfile (see excerpt below).

--- protocol-v2.txt ---
    output = acknowledgements flush-pkt |
	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
	     packfile flush-pkt
---

So either something changed in the protocol with that merge, or there is 
a discrepancy in the documentation, because the above output format 
seems to imply the packfile will be followed by a flush packet. I 
haven't looked at the full contents of the merge, but none of the commit 
messages mention changing this behaviour.

Either way, you can probably parse the packfile to know how many objects 
you will get, and stop after the last object. Or like Junio said, just 
wait for an EOF.

Sorry for the wrong information.

-- 
Regards,
Pratyush Yadav

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-12  0:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-10 23:47 How to determine when to stop receiving pack content Farhan Khan
2019-08-11 15:04 ` Pratyush Yadav
2019-08-11 22:38 ` Junio C Hamano
2019-08-11 23:31 ` Farhan Khan
2019-08-12  0:22   ` Pratyush Yadav

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).