git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* upload-pack timing issue on windows?
@ 2010-02-05 23:51 Erik Faye-Lund
  2010-02-06 10:06 ` Johannes Sixt
  0 siblings, 1 reply; 11+ messages in thread
From: Erik Faye-Lund @ 2010-02-05 23:51 UTC (permalink / raw)
  To: Git Mailing List, msysGit

As some of you might know, I've been working on porting git-daemon to
Windows for quite some time now. As it stands now, there's really only
one known issue that is blocking on my end here:

Something weird happens *sometimes* when upload-pack is exiting,
leading to a client dying with a "fatal: read error: Invalid
argument\nfatal: early EOF"-error. If I place a sleep(1) at some place
after exiting the while(1)-loop in create_pack() in upload-pack.c, the
symptom goes away. create_pack() contains some async-code, but this
doesn't seem to be triggered in my minimal case at all. I've tried
flushing stdout and stderr explicitly, no luck.

How often the issue triggers seems to depend on two things, the size
of the repo and the connection speed. If I clone from localhost, I
can't get it to trigger at all. If the repo is of some size, it
triggers rarely. However if I have a repo with only one commit, it
seems to trigger every single time for me.

I've noticed that one of the last things that happens is a call to
poll with nfds=1. This triggers a special case in our poll-emulation
on Windows; but removing that special case hasn't given me any
positive results.

Does anyone have a hunch about what might trigger this issue?

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: upload-pack timing issue on windows?
  2010-02-05 23:51 upload-pack timing issue on windows? Erik Faye-Lund
@ 2010-02-06 10:06 ` Johannes Sixt
  2010-02-06 12:01   ` [msysGit] " Erik Faye-Lund
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Sixt @ 2010-02-06 10:06 UTC (permalink / raw)
  To: kusmabite; +Cc: msysgit, Git Mailing List

On Samstag, 6. Februar 2010, Erik Faye-Lund wrote:
> As some of you might know, I've been working on porting git-daemon to
> Windows for quite some time now. As it stands now, there's really only
> one known issue that is blocking on my end here:
>
> Something weird happens *sometimes* when upload-pack is exiting,
> leading to a client dying with a "fatal: read error: Invalid
> argument\nfatal: early EOF"-error. If I place a sleep(1) at some place
> after exiting the while(1)-loop in create_pack() in upload-pack.c, the
> symptom goes away. create_pack() contains some async-code, but this
> doesn't seem to be triggered in my minimal case at all. I've tried
> flushing stdout and stderr explicitly, no luck.

I've observed timing related issues in upload-pack as well, but only in the 
case where the die() is called from the async thread. This is the reason why 
t5530 does not pass.

But your case seems to be different - i.e. there is no die() involved. Sorry, 
can't help more...

Perhaps use Procmon to analyse differences among the different successful and 
failing cases.

Try hacking fetch-pack so that it does not announce side-band(-64k). Perhaps 
it makes a difference.

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-02-06 10:06 ` Johannes Sixt
@ 2010-02-06 12:01   ` Erik Faye-Lund
  2010-02-06 22:18     ` Johannes Sixt
  0 siblings, 1 reply; 11+ messages in thread
From: Erik Faye-Lund @ 2010-02-06 12:01 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: msysgit, Git Mailing List

On Sat, Feb 6, 2010 at 11:06 AM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Samstag, 6. Februar 2010, Erik Faye-Lund wrote:
>> As some of you might know, I've been working on porting git-daemon to
>> Windows for quite some time now. As it stands now, there's really only
>> one known issue that is blocking on my end here:
>>
>> Something weird happens *sometimes* when upload-pack is exiting,
>> leading to a client dying with a "fatal: read error: Invalid
>> argument\nfatal: early EOF"-error. If I place a sleep(1) at some place
>> after exiting the while(1)-loop in create_pack() in upload-pack.c, the
>> symptom goes away. create_pack() contains some async-code, but this
>> doesn't seem to be triggered in my minimal case at all. I've tried
>> flushing stdout and stderr explicitly, no luck.
>
> I've observed timing related issues in upload-pack as well, but only in the
> case where the die() is called from the async thread. This is the reason why
> t5530 does not pass.
>
> But your case seems to be different - i.e. there is no die() involved. Sorry,
> can't help more...
>

Yeah, it's probably not the same case, but I certainly do find it
interesting that we seemingly have two separate timing-related around
here somewhere...

> Perhaps use Procmon to analyse differences among the different successful and
> failing cases.
>

I'm not entirely sure what to look for, but I do see that there's
difference. There's about 3.5k lines of logging from git.exe,
git-daemon.exe and git-upload-pack.exe for the failure case versus
2.5k for the successful case. And the last sequence of TCP Send in the
success case is a send of 8 bytes, followed by a send of 212 bytes,
followed again by a send of 1 byte. In the failure case, there's only
a send of 8 bytes in the end. This sequence is reported as sent by
git-daemon.exe. In fact, all TCP actions are reported from
git-daemon.exe, and apart from the last sequence the lengths are
reported as identical.

> Try hacking fetch-pack so that it does not announce side-band(-64k). Perhaps
> it makes a difference.
>

This didn't make any difference. I removed "side-band" and
"side-band-64k" from capabilities in send_ref() in upload-pack.c, as
well as the "if (server_supports("side-band<...>"-lines in
builtin-fetch-pack.c.

While I was at it, I also tried to disable all other capabilities; no luck.

However, I have tracked down a bit of what goes on in the client.
There's a call to read_in_full, called from pack-write.c, line 246
that fails in the failure-case, but not in the success-case. This is
where the client expects "pack\tSHA-1" or "keep\tSHA-1". There "fatal:
early EOF"-messages seems to originate from index-pack.c, line 197.
This is the first line of code in parse_pack_header(), it's also
AFAICT the first call-site for any read(0, <...>) (though fill()).

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-02-06 12:01   ` [msysGit] " Erik Faye-Lund
@ 2010-02-06 22:18     ` Johannes Sixt
  2010-02-08 11:18       ` Erik Faye-Lund
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Sixt @ 2010-02-06 22:18 UTC (permalink / raw)
  To: kusmabite; +Cc: msysgit, Git Mailing List

On Samstag, 6. Februar 2010, Erik Faye-Lund wrote:
> However, I have tracked down a bit of what goes on in the client.
> There's a call to read_in_full, called from pack-write.c, line 246
> that fails in the failure-case, but not in the success-case. This is
> where the client expects "pack\tSHA-1" or "keep\tSHA-1". There "fatal:
> early EOF"-messages seems to originate from index-pack.c, line 197.
> This is the first line of code in parse_pack_header(), it's also
> AFAICT the first call-site for any read(0, <...>) (though fill()).

This looks like upload-pack died without sending enough to fill a pack header.

Try merging this branch:

  git://repo.or.cz/git/mingw/j6t.git async-in-thread

It contains your changes to start_async plus a refinement of die() when it is 
called from the async procedure (it passes t5530, for example). It is also 
converted to pthreads, and therefore also works on Unix. The new 
implementation of start_async is more careful about the file handles, though 
not so much on Windows.

If there's no change for you, then you could look into implementing 
fcntl(F_GETFD/SETFD, FD_CLOEXEC), which are currently ignored, on top of this 
branch, using Get/SetHandleInformation().

Background: On Unix, we need FD_CLOEXEC so that the fds that are meant for the 
async thread do not remain open in an unrelated child process; on Windows, we 
are just lucky and can get away without FD_CLOEXEC because our pipe()s are 
non-inheritable and async only work with pipes. But once we pass other fds to 
the async procedure, we need a working FD_CLOEXEC. Perhaps something in this 
direction is related to your problem.

You could push out your current state of the git-daemon and a recipe to 
reproduce the problem. Perhaps I find some time to look into it.

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-02-06 22:18     ` Johannes Sixt
@ 2010-02-08 11:18       ` Erik Faye-Lund
  2010-02-10 20:41         ` Jay Soffian
  2010-08-22 23:27         ` Erik Faye-Lund
  0 siblings, 2 replies; 11+ messages in thread
From: Erik Faye-Lund @ 2010-02-08 11:18 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: msysgit, Git Mailing List

On Sat, Feb 6, 2010 at 11:18 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Samstag, 6. Februar 2010, Erik Faye-Lund wrote:
>> However, I have tracked down a bit of what goes on in the client.
>> There's a call to read_in_full, called from pack-write.c, line 246
>> that fails in the failure-case, but not in the success-case. This is
>> where the client expects "pack\tSHA-1" or "keep\tSHA-1". There "fatal:
>> early EOF"-messages seems to originate from index-pack.c, line 197.
>> This is the first line of code in parse_pack_header(), it's also
>> AFAICT the first call-site for any read(0, <...>) (though fill()).
>
> This looks like upload-pack died without sending enough to fill a pack header.
>
> Try merging this branch:
>
>  git://repo.or.cz/git/mingw/j6t.git async-in-thread
>
> It contains your changes to start_async plus a refinement of die() when it is
> called from the async procedure (it passes t5530, for example). It is also
> converted to pthreads, and therefore also works on Unix. The new
> implementation of start_async is more careful about the file handles, though
> not so much on Windows.
>
> If there's no change for you, then you could look into implementing
> fcntl(F_GETFD/SETFD, FD_CLOEXEC), which are currently ignored, on top of this
> branch, using Get/SetHandleInformation().
>

Thanks a lot. I tried merging it, but the issue still pops up. I also
tried to implement fcntl(F_GETFD/SETFD, FD_CLOEXEC), still no dice.
I'm not entirely sure if I did it correctly, though.

> Background: On Unix, we need FD_CLOEXEC so that the fds that are meant for the
> async thread do not remain open in an unrelated child process; on Windows, we
> are just lucky and can get away without FD_CLOEXEC because our pipe()s are
> non-inheritable and async only work with pipes. But once we pass other fds to
> the async procedure, we need a working FD_CLOEXEC. Perhaps something in this
> direction is related to your problem.
>
> You could push out your current state of the git-daemon and a recipe to
> reproduce the problem. Perhaps I find some time to look into it.
>

Sure. You can find my current version at
git://repo.or.cz/git/kusma.git work/daemon-fcntl

This branch includes your branch and my fcntl-attempt, as well as an
almost-fixed-up version of the last daemon-win32 series I sent out
(still lacking critical sections when saving process ids, as you
suggested).

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-02-08 11:18       ` Erik Faye-Lund
@ 2010-02-10 20:41         ` Jay Soffian
  2010-08-22 23:27         ` Erik Faye-Lund
  1 sibling, 0 replies; 11+ messages in thread
From: Jay Soffian @ 2010-02-10 20:41 UTC (permalink / raw)
  To: kusmabite; +Cc: Johannes Sixt, msysgit, Git Mailing List

On Mon, Feb 8, 2010 at 6:18 AM, Erik Faye-Lund <kusmabite@googlemail.com> wrote:
> On Sat, Feb 6, 2010 at 11:18 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> On Samstag, 6. Februar 2010, Erik Faye-Lund wrote:
>>> However, I have tracked down a bit of what goes on in the client.
>>> There's a call to read_in_full, called from pack-write.c, line 246
>>> that fails in the failure-case, but not in the success-case. This is
>>> where the client expects "pack\tSHA-1" or "keep\tSHA-1". There "fatal:
>>> early EOF"-messages seems to originate from index-pack.c, line 197.
>>> This is the first line of code in parse_pack_header(), it's also
>>> AFAICT the first call-site for any read(0, <...>) (though fill()).
>>
>> This looks like upload-pack died without sending enough to fill a pack header.
>>
>> Try merging this branch:
>>
>>  git://repo.or.cz/git/mingw/j6t.git async-in-thread
>>
>> It contains your changes to start_async plus a refinement of die() when it is
>> called from the async procedure (it passes t5530, for example). It is also
>> converted to pthreads, and therefore also works on Unix. The new
>> implementation of start_async is more careful about the file handles, though
>> not so much on Windows.
>>
>> If there's no change for you, then you could look into implementing
>> fcntl(F_GETFD/SETFD, FD_CLOEXEC), which are currently ignored, on top of this
>> branch, using Get/SetHandleInformation().
>>
>
> Thanks a lot. I tried merging it, but the issue still pops up. I also
> tried to implement fcntl(F_GETFD/SETFD, FD_CLOEXEC), still no dice.
> I'm not entirely sure if I did it correctly, though.

I have no idea if it's related, but a similar thing seems to happen
with git under cygwin-1.7.1.

http://article.gmane.org/gmane.os.cygwin/114032

This is when cloning/fetching over ssh. I've not personally seen the
problem, but I compile git from source. Coworkers who are using the
cygwin-1.7.1 provided git see the problem consistently.

j.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-02-08 11:18       ` Erik Faye-Lund
  2010-02-10 20:41         ` Jay Soffian
@ 2010-08-22 23:27         ` Erik Faye-Lund
  2010-08-24 19:24           ` Johannes Sixt
  1 sibling, 1 reply; 11+ messages in thread
From: Erik Faye-Lund @ 2010-08-22 23:27 UTC (permalink / raw)
  To: kusmabite; +Cc: Johannes Sixt, msysgit, Git Mailing List

On Mon, Feb 8, 2010 at 1:18 PM, Erik Faye-Lund <kusmabite@googlemail.com> wrote:
> On Sat, Feb 6, 2010 at 11:18 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> On Samstag, 6. Februar 2010, Erik Faye-Lund wrote:
>>> However, I have tracked down a bit of what goes on in the client.
>>> There's a call to read_in_full, called from pack-write.c, line 246
>>> that fails in the failure-case, but not in the success-case. This is
>>> where the client expects "pack\tSHA-1" or "keep\tSHA-1". There "fatal:
>>> early EOF"-messages seems to originate from index-pack.c, line 197.
>>> This is the first line of code in parse_pack_header(), it's also
>>> AFAICT the first call-site for any read(0, <...>) (though fill()).
>>
>> This looks like upload-pack died without sending enough to fill a pack header.
>>
>> Try merging this branch:
>>
>>  git://repo.or.cz/git/mingw/j6t.git async-in-thread
>>
>> It contains your changes to start_async plus a refinement of die() when it is
>> called from the async procedure (it passes t5530, for example). It is also
>> converted to pthreads, and therefore also works on Unix. The new
>> implementation of start_async is more careful about the file handles, though
>> not so much on Windows.
>>
>> If there's no change for you, then you could look into implementing
>> fcntl(F_GETFD/SETFD, FD_CLOEXEC), which are currently ignored, on top of this
>> branch, using Get/SetHandleInformation().
>>
>
> Thanks a lot. I tried merging it, but the issue still pops up. I also
> tried to implement fcntl(F_GETFD/SETFD, FD_CLOEXEC), still no dice.
> I'm not entirely sure if I did it correctly, though.
>

More than 6 months later, and I've finally bothered to debug this further:
- The culprit seems to be our poll-emulation. My understanding is that
poll() was called by create_pack_file() in upload-pack.c with nfds=1
(it's 2 until one of the fds are closed) when there's no data
available in the pipe. Since our poll() always returns POLLIN when
nfds=1, the check for xread(...) == 0 further down in
create_pack_file() cause the fd to be closed, leading to an error on
the client-side.
-  Just removing the nfds=1-hack works for me, but I'm suspecting the
nfds=1-hack is there for some socket-reason. So instead I've replaced
our poll-emulation with gnulib's in my branch (with a couple of
patches on top), and it seems to do the trick for me. I still haven't
tested it heavily, though.
- The easiest way I've found to debug the issue, is to use git-fetch
from localhost with a repo with a single commit with a single, empty
file. Doing this triggers the bug every time for me, and doesn't hide
what's going on as much as git-clone does.

The latest version of my branch can be found here:
http://repo.or.cz/w/git/kusma.git/shortlog/refs/heads/work/daemon-win32-process

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-08-22 23:27         ` Erik Faye-Lund
@ 2010-08-24 19:24           ` Johannes Sixt
  2010-08-25 17:40             ` Erik Faye-Lund
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Sixt @ 2010-08-24 19:24 UTC (permalink / raw)
  To: kusmabite; +Cc: msysgit, Git Mailing List

On Montag, 23. August 2010, Erik Faye-Lund wrote:
> - The culprit seems to be our poll-emulation. My understanding is that
> poll() was called by create_pack_file() in upload-pack.c with nfds=1
> (it's 2 until one of the fds are closed) when there's no data
> available in the pipe. Since our poll() always returns POLLIN when
> nfds=1, the check for xread(...) == 0 further down in
> create_pack_file() cause the fd to be closed, leading to an error on
> the client-side.
> -  Just removing the nfds=1-hack works for me, but I'm suspecting the
> nfds=1-hack is there for some socket-reason. So instead I've replaced
> our poll-emulation with gnulib's in my branch (with a couple of
> patches on top), and it seems to do the trick for me. I still haven't
> tested it heavily, though.

The nfds == 1 hack is an "optimization": When only one channel must be 
observed, then we can let (x)read() wait for data instead of doing it inside 
poll() in some way.

I'm not happy with our poll emulation because it contains a busy-loop. 
Gnulib's version looks quite capable, but I haven't studied it in detail. 
Until then, I trust that it does the right thing.

--Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-08-24 19:24           ` Johannes Sixt
@ 2010-08-25 17:40             ` Erik Faye-Lund
  2010-08-25 20:53               ` Johannes Sixt
  0 siblings, 1 reply; 11+ messages in thread
From: Erik Faye-Lund @ 2010-08-25 17:40 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: msysgit, Git Mailing List

On Tue, Aug 24, 2010 at 9:24 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Montag, 23. August 2010, Erik Faye-Lund wrote:
>> - The culprit seems to be our poll-emulation. My understanding is that
>> poll() was called by create_pack_file() in upload-pack.c with nfds=1
>> (it's 2 until one of the fds are closed) when there's no data
>> available in the pipe. Since our poll() always returns POLLIN when
>> nfds=1, the check for xread(...) == 0 further down in
>> create_pack_file() cause the fd to be closed, leading to an error on
>> the client-side.
>> -  Just removing the nfds=1-hack works for me, but I'm suspecting the
>> nfds=1-hack is there for some socket-reason. So instead I've replaced
>> our poll-emulation with gnulib's in my branch (with a couple of
>> patches on top), and it seems to do the trick for me. I still haven't
>> tested it heavily, though.
>
> The nfds == 1 hack is an "optimization": When only one channel must be
> observed, then we can let (x)read() wait for data instead of doing it inside
> poll() in some way.
>

OK, thanks for the clarification. Unfortunately, this optimization
breaks down in some cases (like the one I described).

> I'm not happy with our poll emulation because it contains a busy-loop.
> Gnulib's version looks quite capable, but I haven't studied it in detail.
> Until then, I trust that it does the right thing.
>

Well, I've found (and supplied patches for) a couple of bugs in it,
but apart from that it seems quite sane. So yeah, I think it might be
the best way forward.

But I'm curious, what's the best way of import a couple of foreign
source files, while maintaining a couple of patches on top of them?
I'm thinking that perhaps a import-commit followed by the patches
would make it easier to merge in changes than to just import the
patched version, but I'm not entirely sure how to do such a merge
without merging a full subtree...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-08-25 17:40             ` Erik Faye-Lund
@ 2010-08-25 20:53               ` Johannes Sixt
  2010-08-25 20:57                 ` Erik Faye-Lund
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Sixt @ 2010-08-25 20:53 UTC (permalink / raw)
  To: kusmabite; +Cc: msysgit, Git Mailing List

On Mittwoch, 25. August 2010, Erik Faye-Lund wrote:
> But I'm curious, what's the best way of import a couple of foreign
> source files, while maintaining a couple of patches on top of them?
> I'm thinking that perhaps a import-commit followed by the patches
> would make it easier to merge in changes than to just import the
> patched version, but I'm not entirely sure how to do such a merge
> without merging a full subtree...

This is about only two files. When a new version is available from upstream, 
just branch from the import-commit, apply the two new files, and merge the 
result.

-- Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [msysGit] upload-pack timing issue on windows?
  2010-08-25 20:53               ` Johannes Sixt
@ 2010-08-25 20:57                 ` Erik Faye-Lund
  0 siblings, 0 replies; 11+ messages in thread
From: Erik Faye-Lund @ 2010-08-25 20:57 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: msysgit, Git Mailing List

On Wed, Aug 25, 2010 at 10:53 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Mittwoch, 25. August 2010, Erik Faye-Lund wrote:
>> But I'm curious, what's the best way of import a couple of foreign
>> source files, while maintaining a couple of patches on top of them?
>> I'm thinking that perhaps a import-commit followed by the patches
>> would make it easier to merge in changes than to just import the
>> patched version, but I'm not entirely sure how to do such a merge
>> without merging a full subtree...
>
> This is about only two files. When a new version is available from upstream,
> just branch from the import-commit, apply the two new files, and merge the
> result.
>

Thanks, makes sense.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-25 20:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-05 23:51 upload-pack timing issue on windows? Erik Faye-Lund
2010-02-06 10:06 ` Johannes Sixt
2010-02-06 12:01   ` [msysGit] " Erik Faye-Lund
2010-02-06 22:18     ` Johannes Sixt
2010-02-08 11:18       ` Erik Faye-Lund
2010-02-10 20:41         ` Jay Soffian
2010-08-22 23:27         ` Erik Faye-Lund
2010-08-24 19:24           ` Johannes Sixt
2010-08-25 17:40             ` Erik Faye-Lund
2010-08-25 20:53               ` Johannes Sixt
2010-08-25 20:57                 ` Erik Faye-Lund

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).