git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Periodic hang during git index-pack
@ 2018-12-19 22:59 Sitsofe Wheeler
  2018-12-19 23:22 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Sitsofe Wheeler @ 2018-12-19 22:59 UTC (permalink / raw)
  To: git

Hi,

While using trying to use git to clone a remote repository git
index-pack occasionally goes on to hang:

export GIT_TRACE=1
git clone -n remotehost:diffusion/LIBEDIT/libedit
22:43:48.948378 git.c:418               trace: built-in: git clone -n
remotehost:diffusion/LIBEDIT/libedit
Cloning into 'libedit'...
22:43:48.951196 run-command.c:643       trace: run_command: unset
GIT_DIR; ssh remotehost 'git-upload-pack
'\''diffusion/LIBEDIT/libedit'\'''
22:43:49.130517 run-command.c:643       trace: run_command: git
index-pack --stdin -v --fix-thin '--keep=fetch-pack 13914 on machine'
--check-self-contained-and-connected
remote: Enumerating objects: 178, done.
22:43:49.132535 git.c:418               trace: built-in: git
index-pack --stdin -v --fix-thin '--keep=fetch-pack 13914 on machine'
--check-self-contained-and-connected
remote: Counting objects: 100% (178/178), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 178 (delta 95), reused 178 (delta 95)
Receiving objects: 100% (178/178), 539.36 KiB | 16.85 MiB/s, done.
Resolving deltas: 100% (95/95), done.
22:43:49.209193 run-command.c:643       trace: run_command: git
rev-list --objects --stdin --not --all --quiet '--progress=Checking
connectivity'
22:43:49.211276 git.c:418               trace: built-in: git rev-list
--objects --stdin --not --all --quiet '--progress=Checking
connectivity'
Wed 19 Dec 22:43:49 GMT 2018
Wed 19 Dec 22:43:49 GMT 2018
22:43:49.220996 git.c:418               trace: built-in: git clone -n
remotehost:diffusion/LIBEDIT/libedit
Cloning into 'libedit'...
22:43:49.223462 run-command.c:643       trace: run_command: unset
GIT_DIR; ssh remotehost 'git-upload-pack
'\''diffusion/LIBEDIT/libedit'\'''
22:43:49.491004 run-command.c:643       trace: run_command: git
index-pack --stdin -v --fix-thin '--keep=fetch-pack 13925 on machine'
--check-self-contained-and-connected
remote: Enumerating objects: 178, done.
22:43:49.492988 git.c:418               trace: built-in: git
index-pack --stdin -v --fix-thin '--keep=fetch-pack 13925 on machine'
--check-self-contained-and-connected
remote: Counting objects: 100% (178/178), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 178 (delta 95), reused 178 (delta 95)
Receiving objects: 100% (178/178), 539.36 KiB | 17.40 MiB/s, done.
Resolving deltas: 100% (95/95), done.
22:43:49.569331 run-command.c:643       trace: run_command: git
rev-list --objects --stdin --not --all --quiet '--progress=Checking
connectivity'
22:43:49.571431 git.c:418               trace: built-in: git rev-list
--objects --stdin --not --all --quiet '--progress=Checking
connectivity'
Wed 19 Dec 22:43:49 GMT 2018
Wed 19 Dec 22:43:49 GMT 2018
22:43:49.581282 git.c:418               trace: built-in: git clone -n
remotehost:diffusion/LIBEDIT/libedit
Cloning into 'libedit'...
22:43:49.584019 run-command.c:643       trace: run_command: unset
GIT_DIR; ssh remotehost 'git-upload-pack
'\''diffusion/LIBEDIT/libedit'\'''
22:43:49.818490 run-command.c:643       trace: run_command: git
index-pack --stdin -v --fix-thin '--keep=fetch-pack 13936 on machine'
--check-self-contained-and-connected
remote: Enumerating objects: 178, done.
22:43:49.820477 git.c:418               trace: built-in: git
index-pack --stdin -v --fix-thin '--keep=fetch-pack 13936 on machine'
--check-self-contained-and-connected
remote: Counting objects: 100% (178/178), done.
remote: Compressing objects: 100% (81/81), done.
Receiving objects:  64% (114/178)

Most of the time cloning complets fine but every now and then it will
hang indefinitely in the receiving objects stage. This happens with
git version 2.17.1 from Ubuntu 18.04 and a hand compiled git version
2.20.1. The remote end is running through Phabricator sshd scripts
(both 2.17.1 and 2.20.1 have been tried on the remote end). ssh
multiplexing is on:

ControlMaster auto
ControlPersist 120
ControlPath /tmp/ssh-%r@%h:%p
ServerAliveInterval 60

Looking at where it is stuck, git is doing read of a pipe:

#0  0x00007fd1b845034e in __libc_read (fd=fd@entry=0,
buf=buf@entry=0x55ab81e19d40 <input_buffer>, nbytes=nbytes@entry=4096)
    at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x000055ab81b51b23 in read (__nbytes=4096, __buf=0x55ab81e19d40
<input_buffer>, __fd=0)
    at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  xread (fd=0, buf=0x55ab81e19d40 <input_buffer>, len=4096) at wrapper.c:260
#3  0x000055ab819d1069 in fill (min=min@entry=1) at builtin/index-pack.c:255
#4  0x000055ab819d123a in unpack_entry_data (offset=268408,
size=size@entry=14675, type=<optimised out>,
oid=oid@entry=0x55ab837cff40)
    at builtin/index-pack.c:445
#5  0x000055ab819d1bc7 in unpack_raw_entry (oid=0x55ab837cff40,
ref_oid=0x7fffee137440, ofs_offset=0x55ab837ce350, obj=0x55ab837cff40)
    at builtin/index-pack.c:526
#6  parse_pack_objects (hash=0x7fffee1374f0 "\001") at builtin/index-pack.c:1113
#7  cmd_index_pack (argc=<optimised out>, argv=<optimised out>,
prefix=<optimised out>) at builtin/index-pack.c:1775
#8  0x000055ab819934e1 in run_builtin (argv=<optimised out>,
argc=<optimised out>, p=<optimised out>) at git.c:421
#9  handle_builtin (argc=<optimised out>, argv=<optimised out>) at git.c:647
#10 0x000055ab81994515 in run_argv (argv=0x7fffee137690,
argcp=0x7fffee13769c) at git.c:701
#11 cmd_main (argc=<optimised out>, argv=<optimised out>) at git.c:798
#12 0x000055ab8199318f in main (argc=7, argv=0x7fffee137908) at common-main.c:45

Does anyone know what might be wrong and/or how to debug this?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Periodic hang during git index-pack
  2018-12-19 22:59 Periodic hang during git index-pack Sitsofe Wheeler
@ 2018-12-19 23:22 ` Jeff King
  2018-12-20 10:03   ` Sitsofe Wheeler
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2018-12-19 23:22 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: git

On Wed, Dec 19, 2018 at 10:59:30PM +0000, Sitsofe Wheeler wrote:

> While using trying to use git to clone a remote repository git
> index-pack occasionally goes on to hang:
> [...]
> Looking at where it is stuck, git is doing read of a pipe:
> 
> #0  0x00007fd1b845034e in __libc_read (fd=fd@entry=0,
> buf=buf@entry=0x55ab81e19d40 <input_buffer>, nbytes=nbytes@entry=4096)
>     at ../sysdeps/unix/sysv/linux/read.c:27
> #1  0x000055ab81b51b23 in read (__nbytes=4096, __buf=0x55ab81e19d40
> <input_buffer>, __fd=0)
>     at /usr/include/x86_64-linux-gnu/bits/unistd.h:44

Index-pack is reading the pack on stdin, so it's expecting more bytes.
Those bytes should be coming from the git-clone process, which is
relaying the bytes from the other side.

Check the backtrace of git-clone to see why it isn't feeding more data
(but note that it will generally have two threads -- one processing the
data from the remote, and one wait()ing for index-pack to finish).

My guess, though, is that you'll find that git-clone is simply waiting
on another pipe: the one from ssh.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Periodic hang during git index-pack
  2018-12-19 23:22 ` Jeff King
@ 2018-12-20 10:03   ` Sitsofe Wheeler
  2018-12-20 15:10     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Sitsofe Wheeler @ 2018-12-20 10:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wed, 19 Dec 2018 at 23:22, Jeff King <peff@peff.net> wrote:
>
> On Wed, Dec 19, 2018 at 10:59:30PM +0000, Sitsofe Wheeler wrote:
>
> > While using trying to use git to clone a remote repository git
> > index-pack occasionally goes on to hang:
> > [...]
> > Looking at where it is stuck, git is doing read of a pipe:
> >
> > #0  0x00007fd1b845034e in __libc_read (fd=fd@entry=0,
> > buf=buf@entry=0x55ab81e19d40 <input_buffer>, nbytes=nbytes@entry=4096)
> >     at ../sysdeps/unix/sysv/linux/read.c:27
> > #1  0x000055ab81b51b23 in read (__nbytes=4096, __buf=0x55ab81e19d40
> > <input_buffer>, __fd=0)
> >     at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
>
> Index-pack is reading the pack on stdin, so it's expecting more bytes.
> Those bytes should be coming from the git-clone process, which is
> relaying the bytes from the other side.
>
> Check the backtrace of git-clone to see why it isn't feeding more data
> (but note that it will generally have two threads -- one processing the
> data from the remote, and one wait()ing for index-pack to finish).
>
> My guess, though, is that you'll find that git-clone is simply waiting
> on another pipe: the one from ssh.

Ok here are backtraces from another run (with SSH multiplexing disabled):

(gdb) thread apply all bt

Thread 2 (Thread 0x7faafbf1c700 (LWP 36586)):
#0  0x00007faafc805384 in __libc_read (fd=fd@entry=5,
    buf=buf@entry=0x7faafbf0ddec, nbytes=nbytes@entry=5)
    at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x000055c8ca2f5b23 in read (__nbytes=5, __buf=0x7faafbf0ddec, __fd=5)
    at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  xread (fd=fd@entry=5, buf=buf@entry=0x7faafbf0ddec, len=len@entry=5)
    at wrapper.c:260
#3  0x000055c8ca2f5cdb in read_in_full (fd=5, buf=buf@entry=0x7faafbf0bdf0,
    count=5, count@entry=8193) at wrapper.c:318
#4  0x000055c8ca27222b in get_packet_data (fd=fd@entry=5,
    src_buf=src_buf@entry=0x0, src_size=src_size@entry=0x0,
    dst=dst@entry=0x7faafbf0bdf0, size=size@entry=8193,
    options=options@entry=0) at pkt-line.c:289
#5  0x000055c8ca272ed8 in packet_read_with_status (fd=fd@entry=5,
    src_buffer=src_buffer@entry=0x0, src_len=src_len@entry=0x0,
    buffer=buffer@entry=0x7faafbf0bdf0
"\001\344\305\066JJj\341q@\243\225ժ\350\026M\fkM9:-ƀ\253\206\336\001\275\070\325\372\250\204\232aM\221\213(\320B%\a\275\233\261g\321A\245\n
\247\374\326\b'\v\252\277rA\211\312l\212j\352\177\260\317j\aT\252&t2\256\254\360\002\217V\024\061k\201ڲ;;\017`\361\020:*b\n5\222\036i\272\067}\360,\323\345Y\314ir\311\034b\232F\267\364\016]",
size=size@entry=65520,
    pktlen=pktlen@entry=0x7faafbf0bd94, options=0) at pkt-line.c:344
#6  0x000055c8ca273078 in packet_read (fd=fd@entry=5,
    src_buffer=src_buffer@entry=0x0, src_len=src_len@entry=0x0,
    buffer=buffer@entry=0x7faafbf0bdf0
"\001\344\305\066JJj\341q@\243\225ժ\350\026M\fkM9:-ƀ\253\206\336\001\275\070\325\372\250\204\232aM\221\213(\320B%\a\275\233\261g\321A\245\n
\247\374\326\b'\v\252\277rA\211\312l\212j\352\177\260\317j\aT\252&t2\256\254\360\002\217V\024\061k\201ڲ;;\017`\361\020:*b\n5\222\036i\272\067}\360,\323\345Y\314ir\311\034b\232F\267\364\016]",
size=size@entry=65520,
    options=options@entry=0) at pkt-line.c:364
#7  0x000055c8ca2cbf73 in recv_sideband (
    me=me@entry=0x55c8ca3466ed "fetch-pack", in_stream=5, out=out@entry=6)
    at sideband.c:143
#8  0x000055c8ca22d6cb in sideband_demux (in=<optimised out>, out=6,
    data=<optimised out>) at fetch-pack.c:776
#9  0x000055c8ca2a8458 in run_thread (data=0x7ffda489ef90)
    at run-command.c:1032
#10 0x00007faafc7fb6db in start_thread (arg=0x7faafbf1c700)
    at pthread_create.c:463
#11 0x00007faafc31c88f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7faafce3bb80 (LWP 36584)):
#0  0x00007faafc805384 in __libc_read (fd=fd@entry=7,
    buf=buf@entry=0x7ffda489ef10, nbytes=nbytes@entry=46)
    at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x000055c8ca2f5b23 in read (__nbytes=46, __buf=0x7ffda489ef10, __fd=7)
    at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  xread (fd=fd@entry=7, buf=buf@entry=0x7ffda489ef10, len=len@entry=46)
    at wrapper.c:260
#3  0x000055c8ca2f5cdb in read_in_full (fd=7, buf=buf@entry=0x7ffda489ef10,
    count=count@entry=46) at wrapper.c:318
#4  0x000055c8ca26a54f in index_pack_lockfile (ip_out=<optimised out>)
    at pack-write.c:297
#5  0x000055c8ca22e18c in get_pack (args=args@entry=0x7ffda489f2e0,
    xd=xd@entry=0x55c8cb4966f8,
    pack_lockfile=pack_lockfile@entry=0x55c8cb496660) at fetch-pack.c:881
#6  0x000055c8ca22fbfb in do_fetch_pack (pack_lockfile=<optimised out>,
    si=0x7ffda489f1f0, nr_sought=<optimised out>, sought=<optimised out>,
    orig_ref=<optimised out>, fd=<optimised out>, args=0x7ffda489f2e0)
    at fetch-pack.c:1019
#7  fetch_pack (args=args@entry=0x7ffda489f2e0, fd=<optimised out>,
    conn=<optimised out>, ref=<optimised out>,
    dest=dest@entry=0x55c8cb498820
"remotehost:diffusion/LIBEDIT/libedit",
sought=sought@entry=0x55c8cb498580,
    nr_sought=<optimised out>, shallow=<optimised out>,
    pack_lockfile=<optimised out>, version=<optimised out>)
    at fetch-pack.c:1649
#8  0x000055c8ca2dddf8 in fetch_refs_via_pack (transport=0x55c8cb496620,
    nr_heads=2, to_fetch=0x55c8cb498580) at transport.c:365
#9  0x000055c8ca2df246 in transport_fetch_refs (
    transport=transport@entry=0x55c8cb496620, refs=refs@entry=0x55c8cb4959e0)
    at transport.c:1295
#10 0x000055c8ca1529ab in cmd_clone (argc=<optimised out>,
    argv=<optimised out>, prefix=<optimised out>) at builtin/clone.c:1212
#11 0x000055c8ca1374e1 in run_builtin (argv=<optimised out>,
    argc=<optimised out>, p=<optimised out>) at git.c:421
#12 handle_builtin (argc=<optimised out>, argv=<optimised out>) at git.c:647
#13 0x000055c8ca138515 in run_argv (argv=0x7ffda489fa00, argcp=0x7ffda489fa0c)
    at git.c:701
#14 cmd_main (argc=<optimised out>, argv=<optimised out>) at git.c:798
#15 0x000055c8ca13718f in main (argc=4, argv=0x7ffda489fc78)
    at common-main.c:45

(gdb) thread apply all bt

Thread 1 (Thread 0x7ff8a03dab80 (LWP 36587)):
#0  0x00007ff89fda434e in __libc_read (fd=fd@entry=0,
    buf=buf@entry=0x5604bea43d40 <input_buffer>, nbytes=nbytes@entry=4096)
    at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x00005604be77bb23 in read (__nbytes=4096,
    __buf=0x5604bea43d40 <input_buffer>, __fd=0)
    at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  xread (fd=0, buf=0x5604bea43d40 <input_buffer>, len=4096) at wrapper.c:260
#3  0x00005604be5fb069 in fill (min=min@entry=1) at builtin/index-pack.c:255
#4  0x00005604be5fb23a in unpack_entry_data (offset=528312,
    size=size@entry=19448, type=<optimised out>, oid=oid@entry=0x5604c0605ea0)
    at builtin/index-pack.c:445
#5  0x00005604be5fbbc7 in unpack_raw_entry (oid=0x5604c0605ea0,
    ref_oid=0x7ffc74cd73f0, ofs_offset=0x5604c0603580, obj=0x5604c0605ea0)
    at builtin/index-pack.c:526
#6  parse_pack_objects (hash=0x7ffc74cd74a0 "\001")
    at builtin/index-pack.c:1113
#7  cmd_index_pack (argc=<optimised out>, argv=<optimised out>,
    prefix=<optimised out>) at builtin/index-pack.c:1775
#8  0x00005604be5bd4e1 in run_builtin (argv=<optimised out>,
    argc=<optimised out>, p=<optimised out>) at git.c:421
#9  handle_builtin (argc=<optimised out>, argv=<optimised out>) at git.c:647
#10 0x00005604be5be515 in run_argv (argv=0x7ffc74cd7640, argcp=0x7ffc74cd764c)
    at git.c:701
#11 cmd_main (argc=<optimised out>, argv=<optimised out>) at git.c:798
#12 0x00005604be5bd18f in main (argc=7, argv=0x7ffc74cd78b8)
    at common-main.c:45

As before, any hints on how to debug this gratefully accepted!

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Periodic hang during git index-pack
  2018-12-20 10:03   ` Sitsofe Wheeler
@ 2018-12-20 15:10     ` Jeff King
  2018-12-20 16:48       ` Sitsofe Wheeler
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2018-12-20 15:10 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: git

On Thu, Dec 20, 2018 at 10:03:45AM +0000, Sitsofe Wheeler wrote:

> > Check the backtrace of git-clone to see why it isn't feeding more data
> > (but note that it will generally have two threads -- one processing the
> > data from the remote, and one wait()ing for index-pack to finish).
> >
> > My guess, though, is that you'll find that git-clone is simply waiting
> > on another pipe: the one from ssh.
> 
> Ok here are backtraces from another run (with SSH multiplexing disabled):

OK, that's about what I expected. Here we have clone's sideband-demux
thread waiting to pull more packfile data from the remote:

> (gdb) thread apply all bt
> 
> Thread 2 (Thread 0x7faafbf1c700 (LWP 36586)):
> #0  0x00007faafc805384 in __libc_read (fd=fd@entry=5,
>     buf=buf@entry=0x7faafbf0ddec, nbytes=nbytes@entry=5)
>     at ../sysdeps/unix/sysv/linux/read.c:27
> #1  0x000055c8ca2f5b23 in read (__nbytes=5, __buf=0x7faafbf0ddec, __fd=5)
>     at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
> [...coming from packet_read / recv_sideband / sideband_demux...]

I assume fd=5 there is a pipe connected to ssh. You could double check
with "lsof" or similar, but I don't think it would ever be reading from
anywhere else.

And then this thread:

> Thread 1 (Thread 0x7faafce3bb80 (LWP 36584)):
> #0  0x00007faafc805384 in __libc_read (fd=fd@entry=7,
>     buf=buf@entry=0x7ffda489ef10, nbytes=nbytes@entry=46)
>     at ../sysdeps/unix/sysv/linux/read.c:27
> #1  0x000055c8ca2f5b23 in read (__nbytes=46, __buf=0x7ffda489ef10, __fd=7)
>     at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
> #2  xread (fd=fd@entry=7, buf=buf@entry=0x7ffda489ef10, len=len@entry=46)
>     at wrapper.c:260
> #3  0x000055c8ca2f5cdb in read_in_full (fd=7, buf=buf@entry=0x7ffda489ef10,
>     count=count@entry=46) at wrapper.c:318
> #4  0x000055c8ca26a54f in index_pack_lockfile (ip_out=<optimised out>)
>     at pack-write.c:297

...is us waiting on index-pack to single completion by printing out the
name of the pack's .keep file.

And then this is just index-pack:

> (gdb) thread apply all bt
> 
> Thread 1 (Thread 0x7ff8a03dab80 (LWP 36587)):
> #0  0x00007ff89fda434e in __libc_read (fd=fd@entry=0,
>     buf=buf@entry=0x5604bea43d40 <input_buffer>, nbytes=nbytes@entry=4096)
>     at ../sysdeps/unix/sysv/linux/read.c:27
> #1  0x00005604be77bb23 in read (__nbytes=4096,
>     __buf=0x5604bea43d40 <input_buffer>, __fd=0)
>     at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
> #2  xread (fd=0, buf=0x5604bea43d40 <input_buffer>, len=4096) at wrapper.c:260
> #3  0x00005604be5fb069 in fill (min=min@entry=1) at builtin/index-pack.c:255

...waiting for more data.

So though it goes back and forth between two processes, the sequence of
blocking data is linear due to the threads:

  ssh -> git-clone        -> index-pack -> git-clone
         (sideband_demux)                  (index_pack_lockfile)

with each blocking on read() from its predecessor. So you need to find
out why "ssh" is blocking. Unfortunately, short of a bug in ssh, the
likely cause is either:

  1. The git-upload-pack on the remote side stopped generating data for
     some reason. You may or may not have access on the remotehost to
     dig into that.

     It's certainly possible there's a deadlock bug between the server
     and client side of a Git conversation. But I'd find it extremely
     unlikely to find such a deadlock bug at this point in the
     conversation, because at this point the client side has nothing
     left to say to the server. The server should just be streaming out
     the packfile bytes and then closing the descriptor.

     You mentioned "Phabricator sshd scripts" running on the server.
     I don't know what Phabricator might be sticking in the middle of
     the connection, but that could be the source of the stall.

  2. It's possible the network connection dropped but ssh did not
     notice. Maybe try turning on ssh keepalives and seeing if it
     eventually times out?

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Periodic hang during git index-pack
  2018-12-20 15:10     ` Jeff King
@ 2018-12-20 16:48       ` Sitsofe Wheeler
  2019-02-03  8:16         ` Sitsofe Wheeler
  0 siblings, 1 reply; 6+ messages in thread
From: Sitsofe Wheeler @ 2018-12-20 16:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, 20 Dec 2018 at 15:11, Jeff King <peff@peff.net> wrote:
>
> OK, that's about what I expected. Here we have clone's sideband-demux
> thread waiting to pull more packfile data from the remote:
>
> > (gdb) thread apply all bt
> >
> > Thread 2 (Thread 0x7faafbf1c700 (LWP 36586)):
> > #0  0x00007faafc805384 in __libc_read (fd=fd@entry=5,
> >     buf=buf@entry=0x7faafbf0ddec, nbytes=nbytes@entry=5)
> >     at ../sysdeps/unix/sysv/linux/read.c:27
> > #1  0x000055c8ca2f5b23 in read (__nbytes=5, __buf=0x7faafbf0ddec, __fd=5)
> >     at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
> > [...coming from packet_read / recv_sideband / sideband_demux...]
>
> I assume fd=5 there is a pipe connected to ssh. You could double check
> with "lsof" or similar, but I don't think it would ever be reading from
> anywhere else.

I checked and in all cases git was reading from a pipe.

[snip]

> with each blocking on read() from its predecessor. So you need to find
> out why "ssh" is blocking. Unfortunately, short of a bug in ssh, the
> likely cause is either:
>
>   1. The git-upload-pack on the remote side stopped generating data for
>      some reason. You may or may not have access on the remotehost to
>      dig into that.
>
>      It's certainly possible there's a deadlock bug between the server
>      and client side of a Git conversation. But I'd find it extremely
>      unlikely to find such a deadlock bug at this point in the
>      conversation, because at this point the client side has nothing
>      left to say to the server. The server should just be streaming out
>      the packfile bytes and then closing the descriptor.

I think it's highly unlikely too given how many good runs we generally have.

>      You mentioned "Phabricator sshd scripts" running on the server.
>      I don't know what Phabricator might be sticking in the middle of
>      the connection, but that could be the source of the stall.

I think you're right. I set up a seperate sshd on a different port on
the same machine where there were no Phabricator callouts and the
problem never manifested...

>   2. It's possible the network connection dropped but ssh did not
>      notice. Maybe try turning on ssh keepalives and seeing if it
>      eventually times out?

I had already done this but the problem still manifested. I went so
far as checking if the problem would happen over the loopback device
on the same machine (via localhost) and the problem continued
happening so I'm fairly sure that rules out networking issues.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Periodic hang during git index-pack
  2018-12-20 16:48       ` Sitsofe Wheeler
@ 2019-02-03  8:16         ` Sitsofe Wheeler
  0 siblings, 0 replies; 6+ messages in thread
From: Sitsofe Wheeler @ 2019-02-03  8:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wed, 19 Dec 2018 at 22:59, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
> While using trying to use git to clone a remote repository git
> index-pack occasionally goes on to hang:

On Thu, 20 Dec 2018 at 16:48, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
> On Thu, 20 Dec 2018 at 15:11, Jeff King <peff@peff.net> wrote:
> >
[snip]
> >
> > with each blocking on read() from its predecessor. So you need to find
> > out why "ssh" is blocking. Unfortunately, short of a bug in ssh, the
> > likely cause is either:
> >
> >   1. The git-upload-pack on the remote side stopped generating data for
> >      some reason. You may or may not have access on the remotehost to
> >      dig into that.
> >
> >      It's certainly possible there's a deadlock bug between the server
> >      and client side of a Git conversation. But I'd find it extremely
> >      unlikely to find such a deadlock bug at this point in the
> >      conversation, because at this point the client side has nothing
> >      left to say to the server. The server should just be streaming out
> >      the packfile bytes and then closing the descriptor.
>
> I think it's highly unlikely too given how many good runs we generally have.
>
> >      You mentioned "Phabricator sshd scripts" running on the server.
> >      I don't know what Phabricator might be sticking in the middle of
> >      the connection, but that could be the source of the stall.
>
> I think you're right. I set up a seperate sshd on a different port on
> the same machine where there were no Phabricator callouts and the
> problem never manifested...

Just to finally follow up - this was confirmed to be a bug in the
Phabricator server side git+ssh wrapper and the issue exists in at
least Phabricator versions between
2016 Week 28 - 2019 Week 3. The Phabricator ssh-exec PHP script was
doing a select() that would very occasionally fail with errno set to
EINTR and then going on to cause subsequent git data to not be sent to
the git client. It's unclear why this started happening more
frequently for us but we had recently changed to a faster/more
cores/larger AWS instance type and we also switched to a later Ubuntu
14.04 kernel...

See https://discourse.phabricator-community.org/t/sporadic-git-cloning-hang-over-ssh/2233
for a longer description of the issue that was reported to the
Phabricator developers. That link contains a patch by myself that
resolves the issue, a link to an upstream Phabricator task made by the
Phabricator devs and a link to an upstream Phabricator diff that works
around the issue in a looser manner than my final patch.

Thanks for your rapid reply and guidance Jeff!

--
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-03  8:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19 22:59 Periodic hang during git index-pack Sitsofe Wheeler
2018-12-19 23:22 ` Jeff King
2018-12-20 10:03   ` Sitsofe Wheeler
2018-12-20 15:10     ` Jeff King
2018-12-20 16:48       ` Sitsofe Wheeler
2019-02-03  8:16         ` Sitsofe Wheeler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).