* reg. fatal: The remote end hung up unexpectedly on NFS
@ 2018-04-06 18:25 Satya Prakash GS
2018-04-06 18:46 ` Ævar Arnfjörð Bjarmason
2018-04-06 19:48 ` Jeff King
0 siblings, 2 replies; 4+ messages in thread
From: Satya Prakash GS @ 2018-04-06 18:25 UTC (permalink / raw)
To: git
Hi,
We have a distributed filesystem with NFS access. On the NFS mount, I
was doing a git-clone and if NFS server crashed and came back up while
the clone is going on, clone fails with the below message:
git clone https://satgs@github.com/fs/private-qa.git
remote: Counting objects: 139419, done.
remote: Compressing objects: 100% (504/504), done.
Receiving objects: 7% (9760/139419), 5.32 MiB | 5.27 MiB/s
error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
On NFS server crash, it usually takes a minute or two for our
filesystem to failover to new NFS server. Initially I suspected it had
something to do with the filesystem, like attributes of the file
written by git weren't matching what it was expecting but the same
test fails on open source NFS server. While clone is going on, if I
stopped the open source NFS server for 2 minutes and restarted it, git
clone fails.
Another interesting thing is, if the restart happens within a few
seconds, git clone succeeds.
Sideband_demux fails while trying to read from the pipe. Read size
doesn't match what is expected. If there are 2 parts to git clone
which is fetching data and writing to local filesystem, is this error
happening while trying to fetch ? Since it succeeds if the restart is
done immediately, has this got something to do with the protocol
timeouts.
Please advise on how to debug this further.
Thanks,
Satya.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reg. fatal: The remote end hung up unexpectedly on NFS
2018-04-06 18:25 reg. fatal: The remote end hung up unexpectedly on NFS Satya Prakash GS
@ 2018-04-06 18:46 ` Ævar Arnfjörð Bjarmason
2018-04-06 19:48 ` Jeff King
1 sibling, 0 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-06 18:46 UTC (permalink / raw)
To: Satya Prakash GS; +Cc: git
On Fri, Apr 06 2018, Satya Prakash wrote:
> Hi,
>
> We have a distributed filesystem with NFS access. On the NFS mount, I
> was doing a git-clone and if NFS server crashed and came back up while
> the clone is going on, clone fails with the below message:
>
> git clone https://satgs@github.com/fs/private-qa.git
>
> remote: Counting objects: 139419, done.
> remote: Compressing objects: 100% (504/504), done.
> Receiving objects: 7% (9760/139419), 5.32 MiB | 5.27 MiB/s
> error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
> fatal: The remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed
>[...]
> Please advise on how to debug this further.
I don't know what this could be, but have you tried turning on the
various GIT_TRACE env variables document in "man git"? Some of those
might hold a clue to what's going on here.
Have you tried to do other git-independent tests to see if in general
the NFS failover isn't resulting in data being written as applications
expect? E.g. doing a recursive wget of some data with known SHA1-sums,
or doing a stess test of some sort on a filesystem that uses checksums
(such as btrfs), which could be loop-mounted to a file stored on NFS.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reg. fatal: The remote end hung up unexpectedly on NFS
2018-04-06 18:25 reg. fatal: The remote end hung up unexpectedly on NFS Satya Prakash GS
2018-04-06 18:46 ` Ævar Arnfjörð Bjarmason
@ 2018-04-06 19:48 ` Jeff King
2018-04-08 19:31 ` Satya Prakash GS
1 sibling, 1 reply; 4+ messages in thread
From: Jeff King @ 2018-04-06 19:48 UTC (permalink / raw)
To: Satya Prakash GS; +Cc: git
On Fri, Apr 06, 2018 at 11:55:51PM +0530, Satya Prakash GS wrote:
> We have a distributed filesystem with NFS access. On the NFS mount, I
> was doing a git-clone and if NFS server crashed and came back up while
> the clone is going on, clone fails with the below message:
>
> git clone https://satgs@github.com/fs/private-qa.git
>
> remote: Counting objects: 139419, done.
> remote: Compressing objects: 100% (504/504), done.
> Receiving objects: 7% (9760/139419), 5.32 MiB | 5.27 MiB/s
> error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
> fatal: The remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed
Curl's result=18 is CURLE_PARTIAL_FILE. Usually that means the other
side hung up partway through. But given the NFS symptoms you describe, I
wonder if fwrite() to the file simply returned an error, and curl gave
up.
> On NFS server crash, it usually takes a minute or two for our
> filesystem to failover to new NFS server. Initially I suspected it had
> something to do with the filesystem, like attributes of the file
> written by git weren't matching what it was expecting but the same
> test fails on open source NFS server. While clone is going on, if I
> stopped the open source NFS server for 2 minutes and restarted it, git
> clone fails.
>
> Another interesting thing is, if the restart happens within a few
> seconds, git clone succeeds.
>
> Sideband_demux fails while trying to read from the pipe. Read size
> doesn't match what is expected. If there are 2 parts to git clone
> which is fetching data and writing to local filesystem, is this error
> happening while trying to fetch ? Since it succeeds if the restart is
> done immediately, has this got something to do with the protocol
> timeouts.
>
> Please advise on how to debug this further.
If you're on Linux, strace could show you the write error. Unfortunately
it's a little tricky because the http bits happen in a sub-process. But
try:
cat >/in/your/$PATH/git-remote-strace <<\EOF
#!/bin/sh
protocol=$(echo "$2" | cut -d: -f1)
exec strace -f -o /tmp/foo.out git-remote-$protocol "$@"
EOF
chmod +x /in/your/$PATH/git-remote-strace
git clone strace::https://github.com/whatever
My guess is you may find a failed `write()` in there.
-Peff
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reg. fatal: The remote end hung up unexpectedly on NFS
2018-04-06 19:48 ` Jeff King
@ 2018-04-08 19:31 ` Satya Prakash GS
0 siblings, 0 replies; 4+ messages in thread
From: Satya Prakash GS @ 2018-04-08 19:31 UTC (permalink / raw)
To: Jeff King, avarab; +Cc: git
Avarab,
I enabled GIT_TRACE, packet, etc. I could not make much sense of the
output. I downloaded git source and started looking at the code.
Yeah, kernel compilation with failovers worked on our filesystem. We
tried xfs test suite with failovers, it worked. Since it's failing
with open source NFS, I am wondering if this even has anything to do
with filesystem.
Jeff,
Thank you for the clear instructions on strace.
Looks like 5 and 6 are two ends of a pipe.
read(5, "\27\3\3\5r", 5) = 5
read(5, "S\10n\230i\257\27\2\333&\335jou~\3036}S\273\212\250\33\310\216b\253\3505\332\266X"...,
1394) = 1394
write(6, "\2530\253x\27\247iz\246\321*\v\33\306.\221I\36\4\304D\\\360\361\306I\213\304\245m\247\30"...,
1370) = 1370
<... read resumed>
"\2530\253x\27\247iz\246\321*\v\33\306.\221I\36\4\304D\\\360\361\306I\213\304\245m\247\30"...,
5071) = 1370
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, <unfinished ...>
read(0, <unfinished ...>
<... rt_sigaction resumed> NULL, 8) = 0
poll([{fd=5, events=POLLIN}], 1, 1000) = 1 ([{fd=5, revents=POLLIN}])
rt_sigaction(SIGPIPE, NULL, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, NULL, 8) = 0
poll([{fd=5, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1
([{fd=5, revents=POLLIN|POLLRDNORM}])
read(5, "\27\3\3\5r", 5) = 5
read(5, "S\10n\230i\257\27\3]T\0{\232\253\377$\265\277\211o.\314T\315\257\335j\322\367\31o\262"...,
1394) = 1394
>>>>>>>>>>>>> Write preceding the last read has succeeded
write(6, "\237%\204W$\236\177\305@\210+\236\227.\316\226\250\t\256:\27?};\270^A\317\204\222\35]"...,
1370) = 1370
>>>>>>>>
<... read resumed>
"\237%\204W$\236\177\305@\210+\236\227.\316\226\250\t\256:\27?};\270^A\317\204\222\35]"...,
3701) = 1370
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, <unfinished ...>
read(0, <unfinished ...>
<... rt_sigaction resumed> NULL, 8) = 0
poll([{fd=5, events=POLLIN}], 1, 1000) = 1 ([{fd=5, revents=POLLIN}])
rt_sigaction(SIGPIPE, NULL, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, NULL, 8) = 0
poll([{fd=5, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1
([{fd=5, revents=POLLIN|POLLRDNORM}])
read(5, "\27\3\3\5r", 5) = 5
>>>>> Problematic reads
read(5, "S\10n\230i\257\27\4\220\243\375\3772\213?\267\356V\246r\226`\223\253^\310\314\207\222\22%\376"...,
1394) = 363
read(5, "", 1031) = 0
>>>>>>>>>>>>>>>
write(5, "\25\3\3\0\32\265(Nk5\330Y\4!\374S\204\377\304\0166j\rV\305;e3\347&w",
31) = 31
close(5)
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {0x441ed0, [PIPE], SA_RESTORER|SA_RESTART,
0x7f038e8bb890}, NULL, 8) = 0
write(2, "error: RPC failed; result=18, HT"..., 46) = 46
brk(0x1597000) = 0x1597000
close(6 <unfinished ...>
<... read resumed> "", 2331) = 0
<... close resumed> ) = 0
read(7, "0000", 4096) = 4
write(2, "fatal: The remote end hung up un"..., 43 <unfinished ...>
read(7, <unfinished ...>
<... write resumed> ) = 43
close(4 <unfinished ...>
<... read resumed> "", 4096) = 0
<... close resumed> )
Last read from pipe (fd 5) was a partial read. Write has succeeded on
the pipe but read failed, that's strange. In this case, it doesn't
even look like the write was interleaved.
Thanks,
Satya.
On Sat, Apr 7, 2018 at 1:18 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 06, 2018 at 11:55:51PM +0530, Satya Prakash GS wrote:
>
>> We have a distributed filesystem with NFS access. On the NFS mount, I
>> was doing a git-clone and if NFS server crashed and came back up while
>> the clone is going on, clone fails with the below message:
>>
>> git clone https://satgs@github.com/fs/private-qa.git
>>
>> remote: Counting objects: 139419, done.
>> remote: Compressing objects: 100% (504/504), done.
>> Receiving objects: 7% (9760/139419), 5.32 MiB | 5.27 MiB/s
>> error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
>> fatal: The remote end hung up unexpectedly
>> fatal: early EOF
>> fatal: index-pack failed
>
> Curl's result=18 is CURLE_PARTIAL_FILE. Usually that means the other
> side hung up partway through. But given the NFS symptoms you describe, I
> wonder if fwrite() to the file simply returned an error, and curl gave
> up.
>
>> On NFS server crash, it usually takes a minute or two for our
>> filesystem to failover to new NFS server. Initially I suspected it had
>> something to do with the filesystem, like attributes of the file
>> written by git weren't matching what it was expecting but the same
>> test fails on open source NFS server. While clone is going on, if I
>> stopped the open source NFS server for 2 minutes and restarted it, git
>> clone fails.
>>
>> Another interesting thing is, if the restart happens within a few
>> seconds, git clone succeeds.
>>
>> Sideband_demux fails while trying to read from the pipe. Read size
>> doesn't match what is expected. If there are 2 parts to git clone
>> which is fetching data and writing to local filesystem, is this error
>> happening while trying to fetch ? Since it succeeds if the restart is
>> done immediately, has this got something to do with the protocol
>> timeouts.
>>
>> Please advise on how to debug this further.
>
> If you're on Linux, strace could show you the write error. Unfortunately
> it's a little tricky because the http bits happen in a sub-process. But
> try:
>
> cat >/in/your/$PATH/git-remote-strace <<\EOF
> #!/bin/sh
> protocol=$(echo "$2" | cut -d: -f1)
> exec strace -f -o /tmp/foo.out git-remote-$protocol "$@"
> EOF
> chmod +x /in/your/$PATH/git-remote-strace
>
> git clone strace::https://github.com/whatever
>
> My guess is you may find a failed `write()` in there.
>
> -Peff
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-04-08 19:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-06 18:25 reg. fatal: The remote end hung up unexpectedly on NFS Satya Prakash GS
2018-04-06 18:46 ` Ævar Arnfjörð Bjarmason
2018-04-06 19:48 ` Jeff King
2018-04-08 19:31 ` Satya Prakash GS
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).