git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git push over HTTP; long delay with no progress, then hang?
@ 2020-05-16  4:09 Bryan Turner
  2020-05-16  6:37 ` SZEDER Gábor
  0 siblings, 1 reply; 2+ messages in thread
From: Bryan Turner @ 2020-05-16  4:09 UTC (permalink / raw)
  To: Git Users

When running a huge "git push" via protocol v0/v1 over HTTP
(repository is ~10GB, with ~104,000 refs), I observe that:
* Git makes an initial connection for a ref advertisement. This
completes almost instantly because the repository is empty
* "git push" then sits in absolute silence for ~10 minutes

The process chain looks like:
git push <URL>
    git-remote-http <URL> <URL>
        git send-pack --stateless-rpc --helper-status --thin
--progress <URL> --stdin

The "git send-pack" process runs at 100% usage for a single CPU core
for this entire duration. Does anyone have any insight into what Git
might be doing during this long delay? Whatever it is, is it perhaps
something Git should actually print some sort of status for? (I've
reproduced this long silence with both Git 2.20.1 and the new Git
2.27.0-rc0.)

After the long delay, I see another HTTP request to the server and
then the "git push" process finally goes into "Enumerating objects",
"Counting objects", "Compressing objects" and finally "Writing
objects".

Another thing I've noticed is that, during this latter part, it
_appears_ Git opens a connection to the remote server and _then_
starts enumerating/counting/compressing, without having actually said
anything to the server. For this huge push, that actually results in
the server aborting the connection due to a read timeout. Why would
"git push" open a server connection and _then_ do all the work
necessary to create a pack to actually send? (Perhaps this is really
an HTTP keepalive issue, where the connection had been used for a
previous request?) As with the long silence, I reproduced this with
both Git 2.20.1 and 2.27.0-rc0.

Lastly, after the timeout, I observed that my "git push" hung forever
with this output displayed:
bturner$ /opt/git/2.27.0-rc0/bin/git push <URL> --all
Enumerating objects: 13135246, done.
Counting objects: 100% (13135246/13135246), done.
Delta compression using up to 20 threads
Compressing objects: 100% (3867748/3867748), done.
Writing objects: 0% (1/13135246)

An lsof on the "git-remote-http" process showed:
bturner$ lsof -p 64855
COMMAND     PID    USER   FD    TYPE             DEVICE  SIZE/OFF
          NODE NAME
git-remot 64855 bturner  cwd     DIR                1,4       352
     168415036 <CWD>
git-remot 64855 bturner  txt     REG                1,4   2305984
     170771507 /opt/git/2.27.0-rc0/libexec/git-core/git-remote-http
git-remot 64855 bturner  txt     REG                1,4     59156
     165457100 /usr/local/Cellar/gettext/0.20.2_1/lib/libintl.8.dylib
git-remot 64855 bturner  txt     REG                1,4     28420
     167195923 /Library/Preferences/Logging/.plist-cache.hR8QH5S4
git-remot 64855 bturner  txt     REG                1,4   1568368
1152921500312496125 /usr/lib/dyld
git-remot 64855 bturner    0    PIPE 0x9a358149ffa29591     65536
               ->0xa5ef9caffef8462c
git-remot 64855 bturner    1    PIPE 0x2dce7cc3b04ce8b8     16384
               ->0x3f62dce60d4a355
git-remot 64855 bturner    2u    CHR               16,8 0t1903241
          2841 /dev/ttys008
git-remot 64855 bturner    3u  systm 0xb078b23901649d47       0t0
               [ctl com.apple.netsrc id 7 unit 48]
git-remot 64855 bturner    4u   unix 0xb078b238edae81ff       0t0
               ->0xb078b238edae8777
git-remot 64855 bturner    5u   IPv6 0xb078b238f873ac47       0t0
           TCP localhost:58980-><host>:<port> (CLOSED)
git-remot 64855 bturner    7    PIPE 0x8e4a8565c7dd32eb     65536
               ->0xaec041bd7407b713
git-remot 64855 bturner    8    PIPE 0x6ed7631819b64580     65536
               ->0x52e7cf331dfd1de7

So at some level it was known that the remote host had closed the
socket, but "git-remote-http" was still sitting there.

I can readily reproduce all of this, but unfortunately can't readily
share the repository. I'm happy to do anything I can to contribute to
debugging, if anyone has any thoughts to share!

Best regards,
Bryan Turner

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: git push over HTTP; long delay with no progress, then hang?
  2020-05-16  4:09 git push over HTTP; long delay with no progress, then hang? Bryan Turner
@ 2020-05-16  6:37 ` SZEDER Gábor
  0 siblings, 0 replies; 2+ messages in thread
From: SZEDER Gábor @ 2020-05-16  6:37 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

On Fri, May 15, 2020 at 09:09:27PM -0700, Bryan Turner wrote:
> When running a huge "git push" via protocol v0/v1 over HTTP

By huge push you mean a lot of refs?

> (repository is ~10GB, with ~104,000 refs), I observe that:
> * Git makes an initial connection for a ref advertisement. This
> completes almost instantly because the repository is empty
> * "git push" then sits in absolute silence for ~10 minutes

I've run into this a few years ago, remember waiting for 57 minutes ;)

> The process chain looks like:
> git push <URL>
>     git-remote-http <URL> <URL>
>         git send-pack --stateless-rpc --helper-status --thin
> --progress <URL> --stdin
> 
> The "git send-pack" process runs at 100% usage for a single CPU core
> for this entire duration. Does anyone have any insight into what Git
> might be doing during this long delay?

Pathspec matching is, if I recall correctly,

  O(nr of refspecs * (nr of local refs + nr of remote refs))

with remote.c:count_refspec_match() responsible the "nr of remote +
local refs" part and remote.c:match_explicit_refs() for the "nr of
refspecs" part.

This is particularly bad for http/https protocols, because 'git push'
expands your refspecs to fully qualified refspecs, passes them to 'git
send-pack', which then performs pathspec matching _again_.  So if you
have a single pathspec with globbing, then 'git push' can do the
pathspec matching still fairly quickly, even if there are a lot of
local and remote refs and if that single globbing pathspec happens to
match a lot of refs, but then the refspec matching in 'git send-pack'
has a whole lot to do, spins the CPU like crazy, and there you are
writing a bug report on Friday evening.

This is less of an issue with other protocols, because they perform
pathspec matching only once, but of course all protocols suffer if you
pass a lot of refspecs to 'git push' or 'git send-pack'.

> Whatever it is, is it perhaps
> something Git should actually print some sort of status for? (I've
> reproduced this long silence with both Git 2.20.1 and the new Git
> 2.27.0-rc0.)

An immediate band-aid might be to teach 'git push' to pass on the
original refspecs to 'git send-pack', as this would reduce the
complexity of that second pathspec matching.  This, of course,
wouldn't help if someone scripted around 'git push' and invoked it
with a lot of refspecs or fed lot of refspecs directly to 'git
send-pack's stdin.

Alternatively, teach 'git send-pack' a new option e.g.
'--only-fully-qualified-refspecs', and teach 'git push' to use it, so
'git send-pack' doesn't have to perform that second pathspec matching,
it would only have to verify that the refspecs it got are indeed all
fully qualified.

Or build the remote refs index earlier and sort refspecs and local
refs, so we could match the lhs of fully qualified refspecs to local
refs in one go while looking up their rhs in the remote ref index,
resulting in O((nr of refspecs + nr of local refs) * log(nr of remote
refs) complexity.  Dunno, it was a long time ago when I last thought
about this.

All this assumes that if there are a lot of refspecs, then they are
fully qualified.  I'd assume that if there are so many refspecs to
cause trouble, then they were generated programmatically, and I'd
(naively? :) assume that if something generates refspecs, then it's
careful and generates fully qualified refspecs.  Anyway, all bets are
off if there are a lot of non-fully-qualified refspecs...

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-16  6:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-16  4:09 git push over HTTP; long delay with no progress, then hang? Bryan Turner
2020-05-16  6:37 ` SZEDER Gábor

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).