git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git 2.26 fetches many times more objects than it should, wasting gigabytes
@ 2020-04-22  8:42 Lubomir Rintel
  2020-04-22  9:57 ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Lubomir Rintel @ 2020-04-22  8:42 UTC (permalink / raw)
  To: git

Hi,

my git repository with Linux grows several gigabytes each time I fetch:

  [lkundrak@furthur linux]$ git fetch --all
  Fetching origin
  Receiving objects: 100% (431/431), 72.19 KiB | 345.00 KiB/s, done.
  Fetching stable
  Receiving objects: 100% (127228/127228), 22.01 MiB | 1.93 MiB/s, done.
  Fetching next
  Receiving objects: 100% (31113/31113), 6.51 MiB | 1.11 MiB/s, done.
  Fetching net
  Receiving objects: 100% (7331963/7331963), 1.20 GiB | 2.48 MiB/s, done.
  Fetching tip
  Receiving objects: 100% (7334643/7334643), 1.20 GiB | 2.44 MiB/s, done.
  Fetching irqchip
  Receiving objects: 100% (7333669/7333669), 1.20 GiB | 2.44 MiB/s, done.
  Fetching drm
  Receiving objects:  26% (1931483/7336388), 687.05 MiB | 1.55 MiB/s
  ...

Note the 1.2 GiB fetches from irqchip, tip, drm, net, etc. It looks like
the whole history gets fetched instead of the few changes that were
added since the fetch.

When I've first noticed this happening I've thrown away the repository,
initialized a new one with Git 2.26.0 and fetched everything anew, but
that didn't help.

I have very little clue about how to debug this. I'd be thankful for
suggestions about how to provide more details if necessary. I'm using
git from a Fedora package with this version number:

  [lkundrak@furthur linux]$ rpm -q git
  git-2.26.0-1.fc32.x86_64

Here's a full log of my today's unfortunate fetch (still running...)

  [lkundrak@furthur linux]$ git fetch --all
  Fetching origin
  remote: Enumerating objects: 766, done.
  remote: Counting objects: 100% (636/636), done.
  remote: Compressing objects: 100% (154/154), done.
  remote: Total 431 (delta 355), reused 335 (delta 275)
  Receiving objects: 100% (431/431), 72.19 KiB | 345.00 KiB/s, done.
  Resolving deltas: 100% (355/355), completed with 120 local objects.
  From git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
     ae83d0b416db00..18bf34080c4c3b  master     -> origin/master
  Fetching stable
  remote: Enumerating objects: 164963, done.
  remote: Counting objects: 100% (156255/156255), done.
  remote: Compressing objects: 100% (35062/35062), done.
  remote: Total 127228 (delta 109912), reused 101371 (delta 91954)
  Receiving objects: 100% (127228/127228), 22.01 MiB | 1.93 MiB/s, done.
  Resolving deltas: 100% (109912/109912), completed with 12616 local objects.
  From git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux
     8488c3f3bc867e..8e2406c8518775  linux-4.19.y -> stable/linux-4.19.y
     dc4059d21d87e8..6ccc74c083c0d4  linux-5.4.y  -> stable/linux-5.4.y
     0634aa9416af81..937381741d02cc  linux-5.5.y  -> stable/linux-5.5.y
     f07f08b09f05e3..7c572703216073  linux-5.6.y  -> stable/linux-5.6.y
   * [new tag]                       v4.19.117    -> v4.19.117
   * [new tag]                       v5.4.34      -> v5.4.34
   * [new tag]                       v5.5.19      -> v5.5.19
   * [new tag]                       v5.6.6       -> v5.6.6
  Fetching next
  remote: Enumerating objects: 75381, done.
  remote: Counting objects: 100% (38266/38266), done.
  remote: Compressing objects: 100% (9052/9052), done.
  remote: Total 31113 (delta 26421), reused 26574 (delta 22000)
  Receiving objects: 100% (31113/31113), 6.51 MiB | 1.11 MiB/s, done.
  Resolving deltas: 100% (26421/26421), completed with 4591 local objects.
  From git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next
   + e98ff732aff9f7...d7b7d5f7953a08 akpm          -> next/akpm  (forced update)
   + 29d027a641745b...8e32e79bb64e7e akpm-base     -> next/akpm-base  (forced update)
   + 6735c84f78e417...a5840f9618a90e master        -> next/master  (forced update)
   + f507be28f9e551...aa411fab3c05eb pending-fixes -> next/pending-fixes  (forced update)
     ae83d0b416db00..18bf34080c4c3b  stable        -> next/stable
   * [new tag]                       next-20200422 -> next-20200422
  Fetching xo
  From github.com:hackerspace/olpc-xo175-linux
   * [new branch]                    lr/8250-json-schema-v2 -> xo/lr/8250-json-schema-v2
   + 3942092b6c20ea...483c7451896cff lr/ariel               -> xo/lr/ariel  (forced update)
   * [new branch]                    lr/ch7033-v4           -> xo/lr/ch7033-v4
     0472b4080244b7..d2339c1aeb192d  lr/mmp-adma            -> xo/lr/mmp-adma
   * [new branch]                    lr/mmp-dts             -> xo/lr/mmp-dts
   + 3dc167b0785d17...2a1d6d9af30e19 lr/mmp2-clk-audio-gpu  -> xo/lr/mmp2-clk-audio-gpu  (forced update)
  Fetching olpc
  Fetching spi
  From git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
     0dadde344d9655..0392727c261bab  for-5.7    -> spi/for-5.7
     59fc9ad5cb108b..2f5f5302c569f7  for-5.8    -> spi/for-5.8
   + 5e60c07c8615e8...bedad93ec5f83a for-linus  -> spi/for-linus  (forced update)
   + 36792a4aa66c21...c5a7b42434ff12 for-next   -> spi/for-next  (forced update)
  Fetching arm-soc
  From git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
   + 512e8d40f91d7e...e9801213465aa8 arm/fixes  -> arm-soc/arm/fixes  (forced update)
   + 512e8d40f91d7e...e9801213465aa8 for-next   -> arm-soc/for-next  (forced update)
  Fetching net
  remote: Enumerating objects: 7331963, done.
  remote: Counting objects: 100% (7331963/7331963), done.
  remote: Compressing objects: 100% (1114459/1114459), done.
  remote: Total 7331963 (delta 6171286), reused 7329526 (delta 6169706)
  Receiving objects: 100% (7331963/7331963), 1.20 GiB | 2.48 MiB/s, done.
  Resolving deltas: 100% (6171286/6171286), done.
  From git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
     9bacd256f13548..b9663b7ca6ff78  master     -> net/master
  Fetching net-next
  From git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
     0fde6e3b55a15a..44dd5efc97dae0  master     -> net-next/master
  Fetching arm
  From git://git.armlinux.org.uk/~rmk/linux-arm
     89604523a76eb3..8f3d9f35428674  fixes      -> arm/fixes
   + 52d3b2f98483c3...365a6327cd643e for-next   -> arm/for-next  (forced update)
     8632e9b5645bbc..ae83d0b416db00  master     -> arm/master
  Fetching tip
  remote: Enumerating objects: 7334643, done.
  remote: Counting objects: 100% (7334643/7334643), done.
  remote: Compressing objects: 100% (1115060/1115060), done.
  Receiving objects: 100% (7334643/7334643), 1.20 GiB | 2.44 MiB/s, done.
  remote: Total 7334643 (delta 6173502), reused 7332019 (delta 6171782)
  Resolving deltas: 100% (6173502/6173502), done.
  From git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
   + 36d1b5ecb41546...f091a1f17abdba auto-latest   -> tip/auto-latest  (forced update)
     11e31f608b499f..6e0d6ac5f3d9d9  core/core     -> tip/core/core
   + 37d2e30492116b...22105aa089a7f5 master        -> tip/master  (forced update)
     94d440d6184678..ac84bac4062e7f  timers/urgent -> tip/timers/urgent
     4caffe6a28d315..675a59b7dec6e0  x86/build     -> tip/x86/build
     aa61ee7b9ee3cb..a85573f7e74191  x86/mm        -> tip/x86/mm
     79a3aaa7b82e31..cd2f45b7514cdd  x86/vdso      -> tip/x86/vdso
  Fetching irqchip
  remote: Enumerating objects: 7333669, done.
  remote: Counting objects: 100% (7333669/7333669), done.
  remote: Compressing objects: 100% (1114570/1114570), done.
  Receiving objects: 100% (7333669/7333669), 1.20 GiB | 2.44 MiB/s, done.
  remote: Total 7333669 (delta 6172952), reused 7331159 (delta 6171299)
  Resolving deltas: 100% (6172952/6172952), done.
  From git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms
   + a2ffd41b964081...41932c0b588f27 hack/vim3l-crap          -> irqchip/hack/vim3l-crap  (forced update)
   + a97bd23c9e3762...751551c45552e3 kvm-arm64/nv-5.7-rc1-WIP -> irqchip/kvm-arm64/nv-5.7-rc1-WIP  (forced update)
  Fetching gpio
  Fetching dvhart
  remote: Enumerating objects: 23, done.
  remote: Counting objects: 100% (23/23), done.
  remote: Total 28 (delta 23), reused 23 (delta 23), pack-reused 5
  Unpacking objects: 100% (28/28), 5.19 KiB | 13.00 KiB/s, done.
  From https://github.com/dvhart/linux-pdx86
     8f3d9f35428674..f7ea285b626682  for-next            -> dvhart/for-next
   * [new branch]                    ib-pdx86-properties -> dvhart/ib-pdx86-properties
     00086336a8d96a..ae83d0b416db00  master              -> dvhart/master
   + 79777a3891c69e...5a93adbdbb42e9 review-andy         -> dvhart/review-andy  (forced update)
  Fetching power-supply
  Fetching mmp
  Fetching usb
  From git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
     be34a5854b4606..8f97250c21f0cf  usb-linus  -> usb/usb-linus
  Fetching pinchartl
  Fetching drm
  remote: Enumerating objects: 7336388, done.
  remote: Counting objects: 100% (7336388/7336388), done.
  remote: Compressing objects: 100% (1091614/1091614), done.
  Receiving objects:  26% (1931483/7336388), 687.05 MiB | 1.55 MiB/s

Here's my .git/config:

  [lkundrak@furthur linux]$ cat .git/config 
  [core]
  	repositoryformatversion = 0
  	filemode = true
  	bare = false
  	logallrefupdates = true
  [gui]
  	wmstate = normal
  	geometry = 1400x954+-1+26 453 376
  [merge]
  	renamelimit = 65535
  [remote "origin"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  	fetch = +refs/heads/*:refs/remotes/origin/*
  [branch "master"]
  	remote = origin
  	merge = refs/heads/master
  [remote "stable"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
  	fetch = +refs/heads/*:refs/remotes/stable/*
  [remote "next"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  	fetch = +refs/heads/*:refs/remotes/next/*
  [remote "xo"]
  	url = git@github.com:hackerspace/olpc-xo175-linux.git
  	fetch = +refs/heads/*:refs/remotes/xo/*
  [branch "lr/olpc-xo175"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175
  [remote "olpc"]
  	url = http://dev.laptop.org/git/olpc-kernel/
  	fetch = +refs/heads/*:refs/remotes/olpc/*
  [remote "spi"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
  	fetch = +refs/heads/*:refs/remotes/spi/*
  [branch "lr/olpc-xo175-fixes5-mmp"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes5-mmp
  [branch "lr/olpc-xo175-fixes4-ap-sp"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes4-ap-sp
  [branch "lr/olpc-xo175-fixes4-ec"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes4-ec
  [branch "lr/olpc-xo175-fixes2-trivial"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes2-trivial
  [branch "lr/olpc-xo175-fixes3-mmp-camera"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes3-mmp-camera
  [branch "lr/olpc-xo175-drm"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-drm
  [remote "arm-soc"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git
  	fetch = +refs/heads/*:refs/remotes/arm-soc/*
  [branch "lr/olpc-xo175-fixes5-ec"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes5-ec
  [branch "lr/olpc-xo175-fixes7-ec"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes7-ec
  [branch "lr/olpc-xo175-fixes7-battery"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes7-battery
  [branch "lr/olpc-xo175-fixes4-mmp-camera"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes4-mmp-camera
  [branch "lr/olpc-xo175-fixes6"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes6
  [branch "lr/olpc-xo175-fixes1-drm"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes1-drm
  [branch "lr/olpc-xo175-fixes1-drm-dt"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes1-drm-dt
  [branch "lr/olpc-xo175-fixes2-drm"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes2-drm
  [remote "net"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
  	fetch = +refs/heads/*:refs/remotes/net/*
  [remote "net-next"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
  	fetch = +refs/heads/*:refs/remotes/net-next/*
  #[remote "olpc-wifi"]
  #	url = git://dev.laptop.org/users/javier/wireless-testing
  #	fetch = +refs/heads/*:refs/remotes/olpc-wifi/*
  [remote "arm"]
  	url = git://git.armlinux.org.uk/~rmk/linux-arm.git
  	fetch = +refs/heads/*:refs/remotes/arm/*
  [branch "lr/olpc-xo175-fixes3-drm"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes3-drm
  [remote "tip"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
  	fetch = +refs/heads/*:refs/remotes/tip/*
  [remote "irqchip"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
  	fetch = +refs/heads/*:refs/remotes/irqchip/*
  [branch "lr/olpc-xo175-fixes6-next"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes6-next
  [remote "gpio"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git
  	fetch = +refs/heads/*:refs/remotes/gpio/*
  [branch "lr/olpc-xo175-fixes8-battery"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes8-battery
  [remote "dvhart"]
  	url = https://github.com/dvhart/linux-pdx86.git
  	fetch = +refs/heads/*:refs/remotes/dvhart/*
  [remote "power-supply"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
  	fetch = +refs/heads/*:refs/remotes/power-supply/*
  [branch "lr/olpc-xo175-battery-v6"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-battery-v6
  [branch "lr/olpc-xo175-fixes2-drm-dt"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes2-drm-dt
  [branch "lr/olpc-xo175-fixes4-drm"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes4-drm
  [branch "olpc-5.0"]
  	remote = xo
  	merge = refs/heads/olpc-5.0
  [branch "lr/olpc-xo175-fixes8-ec"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-fixes8-ec
  [branch "lr/olpc-xo175-drm-dt-v3"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-drm-dt-v3
  [branch "lr/olpc-xo175-mmp-camera-v5"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-mmp-camera-v5
  [branch "lr/olpc-xo175-drm-dt-v4"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-drm-dt-v4
  [branch "lr/olpc-xo175-ec-v6"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-ec-v6
  [branch "lr/olpc-xo175-ec-v7"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-ec-v7
  [branch "lr/olpc-xo175-galcore-v1"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-galcore-v1
  [branch "lr/mmp3-1"]
  	remote = xo
  	merge = refs/heads/lr/mmp3-1
  [branch "lr/mmp3-2"]
  	remote = xo
  	merge = refs/heads/lr/mmp3-2
  [branch "lr/mmp3-4"]
  	remote = xo
  	merge = refs/heads/lr/mmp3-4
  [branch "lr/ch7033"]
  	remote = xo
  	merge = refs/heads/lr/ch7033
  [branch "lr/ch7033-1"]
  	remote = xo
  	merge = refs/heads/lr/ch7033-1
  [branch "lr/w000000t"]
  	remote = xo
  	merge = refs/heads/lr/w000000t
  [branch "lr/ariel"]
  	remote = xo
  	merge = refs/heads/lr/ariel
  [remote "mmp"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/lkundrak/linux-mmp
  	pushurl = git@gitolite.kernel.org:pub/scm/linux/kernel/git/lkundrak/linux-mmp
  	fetch = +refs/heads/*:refs/remotes/mmp/*
  [branch "lr/mmp3-hsic"]
  	remote = xo
  	merge = refs/heads/lr/mmp3-hsic
  [branch "lr/mmp3-hsic-v2"]
  	remote = xo
  	merge = refs/heads/lr/mmp3-hsic-v2
  [branch "lr/mmp2-galcore-v1"]
  	remote = xo
  	merge = refs/heads/lr/olpc-xo175-galcore-v1
  [remote "usb"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
  	fetch = +refs/heads/*:refs/remotes/usb/*
  [remote "pinchartl"]
  	url = git://linuxtv.org/pinchartl/media.git
  	fetch = +refs/heads/*:refs/remotes/pinchartl/*
  [remote "drm"]
  	url = git://anongit.freedesktop.org/drm/drm
  	fetch = +refs/heads/*:refs/remotes/drm/*
  [branch "lr/marvell-dt-validation-v2"]
  	remote = xo
  	merge = refs/heads/lr/marvell-dt-validation-v2
  [branch "lr/mmp3-thermal-v1"]
  	remote = xo
  	merge = refs/heads/lr/mmp3-thermal-v1
  [remote "clk"]
  	url = git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
  	fetch = +refs/heads/*:refs/remotes/clk/*
  [remote "drm-misc"]
  	url = git://anongit.freedesktop.org/drm/drm-misc
  	fetch = +refs/heads/*:refs/remotes/drm-misc/*

Thank you
Lubo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22  8:42 Git 2.26 fetches many times more objects than it should, wasting gigabytes Lubomir Rintel
@ 2020-04-22  9:57 ` Jeff King
  2020-04-22 10:30   ` Jeff King
                     ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jeff King @ 2020-04-22  9:57 UTC (permalink / raw)
  To: Lubomir Rintel; +Cc: Jonathan Nieder, git

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

On Wed, Apr 22, 2020 at 10:42:54AM +0200, Lubomir Rintel wrote:

> my git repository with Linux grows several gigabytes each time I fetch:

Thanks for this report. We've been tracking the issue but have had
trouble reproducing it.

To get you unstuck, the immediate workaround is to drop back to the
older protocol, like:

  git -c protocol.version=0 fetch --all

>   [lkundrak@furthur linux]$ git fetch --all

Here's a recipe based on your fetches that shows the problem.

  # start with an up-to-date regular clone of linus's tree; I had one
  # lying around from https://github.com/torvalds/linux, but the source
  # shouldn't matter
  rm -rf repo.git
  git clone --bare /path/to/linux repo.git
  cd repo.git

  git remote add next git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next
  git remote add xo git@github.com:hackerspace/olpc-xo175-linux
  git fetch --all

The "next" fetch grabs about 30MB of objects. But the xo one downloads
1.5GB from 7.4M objects. That's using v2.26.2, so protocol 2.

If I switch to the v0 protocol like:

  git -c protocol.version=0 fetch --all

then the xo fetch is only 48k objects, 23MB. So this is definitely
exhibiting the problem.

There are a few data points we've been wanting to collect:

 - does setting fetch.negotiationAlgorithm=skipping help? Yes, but not
   as much as the v0 protocol does. It sends 84k objects, 33MB.

 - does the same fetch over v0 stateless-http have similar problems? No,
   swapping out the second "remote add" for:

     git remote add xo https://github.com/hackerspace/olpc-xo175-linux

   results in the same 48k, 32MB fetch. The v0 conversation involved 10
   POST requests. The v2 conversation only took 6 (and generates the
   same big response as the ssh session, unsurprisingly).

So it really does seem like something in v2 is not trying as hard to
negotiate as v0 did, even when using stateless-http.

I'm attaching for-each-ref output before and after the xo fetch. That
should be sufficient to recreate the situation synthetically even once
these repos have moved on.

I have GIT_TRACE_PACKET output showing the whole negotiation, but it's
pretty hard to look at. I _think_ a lot more is said in the v0
conversation, but it's difficult to sort out because there's a lot of
extra packet framing as we shuttle bits back and forth between
remote-curl and fetch-pack.

-Peff

[-- Attachment #2: refs.before.gz --]
[-- Type: application/gzip, Size: 18588 bytes --]

[-- Attachment #3: refs.after.gz --]
[-- Type: application/gzip, Size: 23471 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22  9:57 ` Jeff King
@ 2020-04-22 10:30   ` Jeff King
  2020-04-22 10:40     ` Jeff King
  2020-04-22 15:40   ` Jonathan Nieder
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2020-04-22 10:30 UTC (permalink / raw)
  To: Lubomir Rintel; +Cc: Jonathan Nieder, git

On Wed, Apr 22, 2020 at 05:57:02AM -0400, Jeff King wrote:

> I'm attaching for-each-ref output before and after the xo fetch. That
> should be sufficient to recreate the situation synthetically even once
> these repos have moved on.
> 
> I have GIT_TRACE_PACKET output showing the whole negotiation, but it's
> pretty hard to look at. I _think_ a lot more is said in the v0
> conversation, but it's difficult to sort out because there's a lot of
> extra packet framing as we shuttle bits back and forth between
> remote-curl and fetch-pack.

Here's what I did manage to pull out if it. In both versions we send a
bunch of "have" lines, and get a bunch of NAKs back. Which makes sense.
We have a bunch of refs that the other side doesn't know about.

In the v0 http session, we finally get an ACK on
8f3d9f354286745c751374f5f1fcafee6b3f3136, which is Linus's 5.7-rc1. We
send that commit as a "have" in the 9th batch.

In the v2 session, we never get an ACK at all (which unsurprisingly
leads to the other side sending the full history). But that commit id is
nowhere to be found in the trace! We appear to give up after 5 rounds.

So it really just seems like v2 does not try hard enough. I think the
culprit is the MAX_IN_VAIN setting. If I do this:

diff --git a/fetch-pack.c b/fetch-pack.c
index 1734a573b0..016a413d49 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -46,7 +46,7 @@ static struct strbuf fsck_msg_types = STRBUF_INIT;
  * After sending this many "have"s if we do not get any new ACK , we
  * give up traversing our history.
  */
-#define MAX_IN_VAIN 256
+#define MAX_IN_VAIN 20000
 
 static int multi_ack, use_sideband;
 /* Allow specifying sha1 if it is a ref tip. */

then I get that same 48k objects, 23MB fetch that v0 does.

-Peff

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 10:30   ` Jeff King
@ 2020-04-22 10:40     ` Jeff King
  2020-04-22 15:33       ` Junio C Hamano
  2020-04-23 21:37       ` Jonathan Tan
  0 siblings, 2 replies; 18+ messages in thread
From: Jeff King @ 2020-04-22 10:40 UTC (permalink / raw)
  To: Lubomir Rintel; +Cc: Jonathan Nieder, git

On Wed, Apr 22, 2020 at 06:30:11AM -0400, Jeff King wrote:

> So it really just seems like v2 does not try hard enough. I think the
> culprit is the MAX_IN_VAIN setting. If I do this:
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 1734a573b0..016a413d49 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -46,7 +46,7 @@ static struct strbuf fsck_msg_types = STRBUF_INIT;
>   * After sending this many "have"s if we do not get any new ACK , we
>   * give up traversing our history.
>   */
> -#define MAX_IN_VAIN 256
> +#define MAX_IN_VAIN 20000
>  
>  static int multi_ack, use_sideband;
>  /* Allow specifying sha1 if it is a ref tip. */
> 
> then I get that same 48k objects, 23MB fetch that v0 does.

I don't quite think that's the solution, though. Both old and new are
supposed to be respecting MAX_IN_VAIN. So it's not at all clear to me
why it restricts the number of haves we'll send in v2, but not in v0.

Maybe somebody more familiar with the negotiation code can comment
further.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 10:40     ` Jeff King
@ 2020-04-22 15:33       ` Junio C Hamano
  2020-04-22 19:33         ` Jeff King
  2020-04-23 21:37       ` Jonathan Tan
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2020-04-22 15:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Lubomir Rintel, Jonathan Nieder, git

Jeff King <peff@peff.net> writes:

> On Wed, Apr 22, 2020 at 06:30:11AM -0400, Jeff King wrote:
>
>> So it really just seems like v2 does not try hard enough. I think the
>> culprit is the MAX_IN_VAIN setting. If I do this:
>> 
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index 1734a573b0..016a413d49 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -46,7 +46,7 @@ static struct strbuf fsck_msg_types = STRBUF_INIT;
>>   * After sending this many "have"s if we do not get any new ACK , we
>>   * give up traversing our history.
>>   */
>> -#define MAX_IN_VAIN 256
>> +#define MAX_IN_VAIN 20000
>>  
>>  static int multi_ack, use_sideband;
>>  /* Allow specifying sha1 if it is a ref tip. */
>> 
>> then I get that same 48k objects, 23MB fetch that v0 does.
>
> I don't quite think that's the solution, though. Both old and new are
> supposed to be respecting MAX_IN_VAIN. So it's not at all clear to me
> why it restricts the number of haves we'll send in v2, but not in v0.

Thanks for digging.  I tend to agree with your assessment that the
setting should not make a difference, if v0 find the common out of
the exchange within the same number of "have"s.

I am guilty of introducing the hardcoded "give up after this many
naks", which I admit I was never fond of, back in the days there was
only one original protocol.  In retrospect, I probably should have
done "after this many naks, stop sending each and every commit but
start skipping exponentially (or fibonacci)" instead.  After all,
this was meant to prevent walking all the way down to a different
root commit when you have more of them than the repository you are
fetching from---but (1) skipping exponentially down to root is way
less expensive, even if it is a bit more expensive than not walking
at all, and (2) if we find a common tree, even if it is distant, it
is way better than not having any common tree at all.

If we had such a code, however, it would probably have swept the
real cause of the issue people are reporting under the rug, though.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22  9:57 ` Jeff King
  2020-04-22 10:30   ` Jeff King
@ 2020-04-22 15:40   ` Jonathan Nieder
  2020-04-22 19:36     ` Jeff King
  2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
  2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
  3 siblings, 1 reply; 18+ messages in thread
From: Jonathan Nieder @ 2020-04-22 15:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Lubomir Rintel, git, Jonathan Tan

(+cc: Jonathan Tan)
Hi,

Jeff King wrote:

> Here's a recipe based on your fetches that shows the problem.
>
>   # start with an up-to-date regular clone of linus's tree; I had one
>   # lying around from https://github.com/torvalds/linux, but the source
>   # shouldn't matter
>   rm -rf repo.git
>   git clone --bare /path/to/linux repo.git
>   cd repo.git
>
>   git remote add next git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next
>   git remote add xo git@github.com:hackerspace/olpc-xo175-linux
>   git fetch --all
>
> The "next" fetch grabs about 30MB of objects. But the xo one downloads
> 1.5GB from 7.4M objects. That's using v2.26.2, so protocol 2.

Thanks!  I'll give it a try.

[...]
> There are a few data points we've been wanting to collect:
>
>  - does setting fetch.negotiationAlgorithm=skipping help? Yes, but not
>    as much as the v0 protocol does. It sends 84k objects, 33MB.

That's pretty good.  Tightening it further would require changing the
protocol to allow the client to say "please don't send me a pack; I want
to continue with negotiation".

>  - does the same fetch over v0 stateless-http have similar problems? No,
>    swapping out the second "remote add" for:
>
>      git remote add xo https://github.com/hackerspace/olpc-xo175-linux
>
>    results in the same 48k, 32MB fetch. The v0 conversation involved 10
>    POST requests. The v2 conversation only took 6 (and generates the
>    same big response as the ssh session, unsurprisingly).
>
> So it really does seem like something in v2 is not trying as hard to
> negotiate as v0 did, even when using stateless-http.

Interesting!  So it sounds like some refs that are not being fetched
are important here to the negotiation.  And the default (non-skipping)
negotiation algorithm is doing a bad job of exploring that part of
history.

Will take a closer look.

I think this still suggests that we should go ahead and switch
negotiation algorithms, both because it avoids this MAX_IN_VAIN and
because it reduces the number of rounds needed to make progress.

I'd also be tempted to get rid of MAX_IN_VAIN.  If we're at the point
of giving up, shouldn't we error out instead of having the server send
a copy of the entirety of history?

Jonathan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] Revert "fetch: default to protocol version 2"
  2020-04-22  9:57 ` Jeff King
  2020-04-22 10:30   ` Jeff King
  2020-04-22 15:40   ` Jonathan Nieder
@ 2020-04-22 15:50   ` Jonathan Nieder
  2020-04-22 18:23     ` Junio C Hamano
  2020-04-22 19:40     ` Jeff King
  2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
  3 siblings, 2 replies; 18+ messages in thread
From: Jonathan Nieder @ 2020-04-22 15:50 UTC (permalink / raw)
  To: Jeff King
  Cc: Lubomir Rintel, git, Jonathan Tan, Dixit, Ashutosh, Jiri Slaby,
	Konstantin Ryabitsev, Junio C Hamano

This reverts commit 684ceae32dae726c6a5c693b257b156926aba8b7.

Users fetching from linux-next and other kernel remotes are reporting
that the limited ref advertisement causes negotiation to reach
MAX_IN_VAIN, resulting in too-large fetches.

Reported-by: Lubomir Rintel <lkundrak@v3.sk>
Reported-by: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi again,

Jeff King wrote:

> Thanks for this report. We've been tracking the issue but have had
> trouble reproducing it.
>
> To get you unstuck, the immediate workaround is to drop back to the
> older protocol, like:
>
>   git -c protocol.version=0 fetch --all

By the way, I'd recommend the immediate workaround of

	git fetch --negotiation-tip=refs/remotes/xo/* xo

instead.  But that's a separate subject.

[...]
> There are a few data points we've been wanting to collect:
[...]
> I'm attaching for-each-ref output before and after the xo fetch. That
> should be sufficient to recreate the situation synthetically even once
> these repos have moved on.

Excellent --- I think this is enough for us to have something to use
to investigate, switching users to protocol v0 in the meantime.

Thanks,
Jonathan

 Documentation/config/protocol.txt | 2 +-
 protocol.c                        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/protocol.txt b/Documentation/config/protocol.txt
index 756591d77b0..0b40141613e 100644
--- a/Documentation/config/protocol.txt
+++ b/Documentation/config/protocol.txt
@@ -48,7 +48,7 @@ protocol.version::
 	If set, clients will attempt to communicate with a server
 	using the specified protocol version.  If the server does
 	not support it, communication falls back to version 0.
-	If unset, the default is `2`.
+	If unset, the default is `0`.
 	Supported versions:
 +
 --
diff --git a/protocol.c b/protocol.c
index 803bef5c87e..d390391ebac 100644
--- a/protocol.c
+++ b/protocol.c
@@ -39,7 +39,7 @@ enum protocol_version get_protocol_version_config(void)
 		return env;
 	}
 
-	return protocol_v2;
+	return protocol_v0;
 }
 
 enum protocol_version determine_protocol_version_server(void)
-- 
2.26.2.303.gf8c07b1a785


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22  9:57 ` Jeff King
                     ` (2 preceding siblings ...)
  2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
@ 2020-04-22 16:53   ` Jonathan Nieder
  2020-04-22 17:32     ` Junio C Hamano
  2020-04-22 19:18     ` Jeff King
  3 siblings, 2 replies; 18+ messages in thread
From: Jonathan Nieder @ 2020-04-22 16:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Lubomir Rintel, git, Jonathan Tan

Jeff King wrote:

> So it really does seem like something in v2 is not trying as hard to
> negotiate as v0 did, even when using stateless-http.
>
> I'm attaching for-each-ref output before and after the xo fetch. That
> should be sufficient to recreate the situation synthetically even once
> these repos have moved on.

Thanks again.  Looked a little closer and the advertised refs shouldn't
matter.  We just have two completely different code paths for
negotiation, one for v0 and one for v2.  In v0, we do

	fetch_negotiator_init(r, negotiator);
	mark_complete_and_common_ref(negotiator, args, &ref);
	filter_refs(argrs, &ref, sought, nr_sought);
	find_common(negotiator, args, &ref, sought, nr_sought);

find_common does

	count = 0;
	while ((oid = negotiator->next(negotiator)) {
		write "have %s\n", oid
		if (++count >= flush_at) {
			... flush ...
			flush_at = next_flush(args->stateless_rpc, count);
			...

In v2, the corresponding code is in add_haves:

	while ((oid = negotiator->next(negotiator))) {
		write "have %s\n", oid
		if (++haves_added >= *haves_to_send)
			break;
	}
	*in_vain += haves_added;
	if (!haves_added || *in_vain >= MAX_IN_VAIN)
		write "done"
	*haves_to_send = next_flush(1, *haves_to_send);

In other words, in v2 we reset the counter on each round whereas in v0
we keep a running total.  That would be expected to produce larger
negotiate requests in v2 than v0 (which looks like an unintentional
difference, but not the one producing this bug).

So much for flush_at handling.  in_vain seems like another area to
compare: in v2, the driving logic is in do_fetch_pack_v2:

	haves_to_send = INITIAL_FLUSH;
 negotiate_loop:
	while (!send_fetch_request(negotiator, fd, args, ref,
				   &common, &haves_to_send,
				   &in_vain, reader.use_sideband))
		switch (process_acks(negotiator, &reader, &common)) {
		case 1:
			in_vain = 0; /* renewed hope!
			continue;
		case 2:
			break negotiate_loop; /* time to move on */
		default:
			continue;
		}

When process_acks sees an ACK, it passes it on to the negotiator.
It wants to record that it received an ack to reset in_vain, but
it forgets to!  The function is initialized and read but never
written to.

So I'd expect the following to help:

diff --git i/fetch-pack.c w/fetch-pack.c
index 1734a573b01..a1d743e1f61 100644
--- i/fetch-pack.c
+++ w/fetch-pack.c
@@ -1287,6 +1287,8 @@ static int process_acks(struct fetch_negotiator *negotiator,
 			struct object_id oid;
 			if (!get_oid_hex(arg, &oid)) {
 				struct commit *commit;
+
+				received_ack = 1;
 				oidset_insert(common, &oid);
 				commit = lookup_commit(the_repository, &oid);
 				if (negotiator)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
@ 2020-04-22 17:32     ` Junio C Hamano
  2020-04-22 19:18     ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2020-04-22 17:32 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, Lubomir Rintel, git, Jonathan Tan

Jonathan Nieder <jrnieder@gmail.com> writes:

> When process_acks sees an ACK, it passes it on to the negotiator.
> It wants to record that it received an ack to reset in_vain, but
> it forgets to!  The function is initialized and read but never
> written to.

Yeah, this smells like the right solution ;-)

s/function/variable/ though.

>
> So I'd expect the following to help:
>
> diff --git i/fetch-pack.c w/fetch-pack.c
> index 1734a573b01..a1d743e1f61 100644
> --- i/fetch-pack.c
> +++ w/fetch-pack.c
> @@ -1287,6 +1287,8 @@ static int process_acks(struct fetch_negotiator *negotiator,
>  			struct object_id oid;
>  			if (!get_oid_hex(arg, &oid)) {
>  				struct commit *commit;
> +
> +				received_ack = 1;
>  				oidset_insert(common, &oid);
>  				commit = lookup_commit(the_repository, &oid);
>  				if (negotiator)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "fetch: default to protocol version 2"
  2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
@ 2020-04-22 18:23     ` Junio C Hamano
  2020-04-22 19:40     ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2020-04-22 18:23 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Jeff King, Lubomir Rintel, git, Jonathan Tan, Dixit, Ashutosh,
	Jiri Slaby, Konstantin Ryabitsev

Jonathan Nieder <jrnieder@gmail.com> writes:

> This reverts commit 684ceae32dae726c6a5c693b257b156926aba8b7.
>
> Users fetching from linux-next and other kernel remotes are reporting
> that the limited ref advertisement causes negotiation to reach
> MAX_IN_VAIN, resulting in too-large fetches.
>
> Reported-by: Lubomir Rintel <lkundrak@v3.sk>
> Reported-by: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
> Reported-by: Jiri Slaby <jslaby@suse.cz>
> Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---

With the one-liner fix for resetting the "we haven't heard an Ack"
timer, I am no longer so worried by hurting users by keeping them on
v2 by default, but it will take some time for the fix to trickle
down from the 'master' track to a maintenance release, so I am OK to
apply this patch in the meantime.  But let's flip the default back
to v2 after the one-liner fix lands to see if (1) the fix truly
helps and (2) people hit other issues in the difference between v0
and v2.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
  2020-04-22 17:32     ` Junio C Hamano
@ 2020-04-22 19:18     ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2020-04-22 19:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Lubomir Rintel, git, Jonathan Tan

On Wed, Apr 22, 2020 at 09:53:58AM -0700, Jonathan Nieder wrote:

> When process_acks sees an ACK, it passes it on to the negotiator.
> It wants to record that it received an ack to reset in_vain, but
> it forgets to!  The function is initialized and read but never
> written to.

I wondered if it might be something like this, too (and this might well
be an independent bug), but...

> So I'd expect the following to help:
> 
> diff --git i/fetch-pack.c w/fetch-pack.c
> index 1734a573b01..a1d743e1f61 100644
> --- i/fetch-pack.c
> +++ w/fetch-pack.c
> @@ -1287,6 +1287,8 @@ static int process_acks(struct fetch_negotiator *negotiator,
>  			struct object_id oid;
>  			if (!get_oid_hex(arg, &oid)) {
>  				struct commit *commit;
> +
> +				received_ack = 1;
>  				oidset_insert(common, &oid);
>  				commit = lookup_commit(the_repository, &oid);
>  				if (negotiator)

It doesn't. We never get any ACK from the server at all, because we give
up on sending haves before hitting any common commit.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 15:33       ` Junio C Hamano
@ 2020-04-22 19:33         ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2020-04-22 19:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lubomir Rintel, Jonathan Nieder, git

On Wed, Apr 22, 2020 at 08:33:48AM -0700, Junio C Hamano wrote:

> > I don't quite think that's the solution, though. Both old and new are
> > supposed to be respecting MAX_IN_VAIN. So it's not at all clear to me
> > why it restricts the number of haves we'll send in v2, but not in v0.
> 
> Thanks for digging.  I tend to agree with your assessment that the
> setting should not make a difference, if v0 find the common out of
> the exchange within the same number of "have"s.

I think v0 sends many more haves. Again, it's hard to compare the
protocol traces because of the framing, but if I simplify each one like:

  perl -lne '/fetch-pack([<>] .*)/ and print "fetch$1"' <packet-v0.trace  >small-v0.trace
  perl -lne '/fetch([<>] .*)/ and print "fetch$1"' <packet-v2.trace  >small-v2.trace

I think we can get an apples-to-apples-ish comparison. And the results
are quite different:

  $ grep -c have small-v0.trace
  11342
  $ grep -c have small-v2.trace
  496

So I think the two protocols are treating MAX_IN_VAIN quite differently.

It looks like v0 only respects it after seeing a "continue" (or maybe
any non-common ACK; they all seem to trigger got_continue), but v2 will
use it to limit the haves we send when the other side is just NAKing.

> I am guilty of introducing the hardcoded "give up after this many
> naks", which I admit I was never fond of, back in the days there was
> only one original protocol.  In retrospect, I probably should have
> done "after this many naks, stop sending each and every commit but
> start skipping exponentially (or fibonacci)" instead.  After all,
> this was meant to prevent walking all the way down to a different
> root commit when you have more of them than the repository you are
> fetching from---but (1) skipping exponentially down to root is way
> less expensive, even if it is a bit more expensive than not walking
> at all, and (2) if we find a common tree, even if it is distant, it
> is way better than not having any common tree at all.

I think fetch.negotiationAlgorithm=skipping is that thing. And it _does_
paper over the problem (the most horrific case goes away, but you end up
with twice as many objects as v2 finds).

Limiting the amount of work we're willing to spend digging in history
does make sense, but it seems like we'd always want to at least dig a
little on each ref. For example, imagine a pathological case like this:

  - the client has 10,001 refs; the first 10,000 (sorted alphabetically)
    point to commit graph X. The last one points to some disjoint commit
    graph Y.

  - the server only cares about Y, and it has some Y' that adds one
    commit on top

We _should_ be able to serve that fetch with a single commit (Y->Y').
And we could find it trivially by feeding all of the ref tips as "have"
lines. But I suspect we wouldn't with v2, as we'd feed the first couple
hundred haves and then give up.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 15:40   ` Jonathan Nieder
@ 2020-04-22 19:36     ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2020-04-22 19:36 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Lubomir Rintel, git, Jonathan Tan

On Wed, Apr 22, 2020 at 08:40:25AM -0700, Jonathan Nieder wrote:

> I think this still suggests that we should go ahead and switch
> negotiation algorithms, both because it avoids this MAX_IN_VAIN and
> because it reduces the number of rounds needed to make progress.

I'd be happier about that if the other algorithm turned up the same pack
that v0 does. But it has twice as many objects.

It looks to me like v0 is just more aggressive about digging in the
history. That _does_ cost more rounds, but this example shows that
there's a benefit to doing so for real-world cases.

> I'd also be tempted to get rid of MAX_IN_VAIN.  If we're at the point
> of giving up, shouldn't we error out instead of having the server send
> a copy of the entirety of history?

How would you fetch in cases where the client and server legitimately
don't have any common commits?

You could add a flag to force it, but I don't know that you're really
making users any happier. Fetching the whole history is annoying, but
refusing to fetch at all is perhaps more so. From the user's perspective
either the full fetch is what they want, or Git is broken.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "fetch: default to protocol version 2"
  2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
  2020-04-22 18:23     ` Junio C Hamano
@ 2020-04-22 19:40     ` Jeff King
  2020-04-22 19:47       ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff King @ 2020-04-22 19:40 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Lubomir Rintel, git, Jonathan Tan, Dixit, Ashutosh, Jiri Slaby,
	Konstantin Ryabitsev, Junio C Hamano

On Wed, Apr 22, 2020 at 08:50:47AM -0700, Jonathan Nieder wrote:

> This reverts commit 684ceae32dae726c6a5c693b257b156926aba8b7.
> 
> Users fetching from linux-next and other kernel remotes are reporting
> that the limited ref advertisement causes negotiation to reach
> MAX_IN_VAIN, resulting in too-large fetches.

OK, now that we have data I think this strategy is reasonable.

That said, it will take a while to make it to a release, so we very well
may have brought v2 and v0 to parity in the meantime.

> > To get you unstuck, the immediate workaround is to drop back to the
> > older protocol, like:
> >
> >   git -c protocol.version=0 fetch --all
> 
> By the way, I'd recommend the immediate workaround of
> 
> 	git fetch --negotiation-tip=refs/remotes/xo/* xo
> 
> instead.  But that's a separate subject.

It seems like if we are fetching with refspec X/*:Y/* that we should
perhaps automatically select our local Y/* negotiation tips.

That said, neither it (nor the manual version above) would help the case
I've been testing with. It's a first fetch from "xo", which can reuse
history we already have from other remotes.

I agree it's a good workaround for folks doing their daily fetches,
though.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "fetch: default to protocol version 2"
  2020-04-22 19:40     ` Jeff King
@ 2020-04-22 19:47       ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2020-04-22 19:47 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Lubomir Rintel, git, Jonathan Tan, Dixit, Ashutosh, Jiri Slaby,
	Konstantin Ryabitsev, Junio C Hamano

On Wed, Apr 22, 2020 at 03:40:47PM -0400, Jeff King wrote:

> > By the way, I'd recommend the immediate workaround of
> > 
> > 	git fetch --negotiation-tip=refs/remotes/xo/* xo
> > 
> > instead.  But that's a separate subject.
> 
> It seems like if we are fetching with refspec X/*:Y/* that we should
> perhaps automatically select our local Y/* negotiation tips.

Actually, I guess that option is not "please prioritize these tips", but
rather "use only these tips". So using it can sometimes hurt.

> That said, neither it (nor the manual version above) would help the case
> I've been testing with. It's a first fetch from "xo", which can reuse
> history we already have from other remotes.
> 
> I agree it's a good workaround for folks doing their daily fetches,
> though.

I did confirm that re-running my test with --negotiation-tip=master (to
point to Linus's tree, avoiding the clogging of "next" commits) results
in the usual good pack in a reasonable time.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-22 10:40     ` Jeff King
  2020-04-22 15:33       ` Junio C Hamano
@ 2020-04-23 21:37       ` Jonathan Tan
  2020-04-23 21:54         ` Junio C Hamano
  2020-04-24  5:32         ` Jeff King
  1 sibling, 2 replies; 18+ messages in thread
From: Jonathan Tan @ 2020-04-23 21:37 UTC (permalink / raw)
  To: peff; +Cc: lkundrak, jrnieder, git, Jonathan Tan

> On Wed, Apr 22, 2020 at 06:30:11AM -0400, Jeff King wrote:
> 
> > So it really just seems like v2 does not try hard enough. I think the
> > culprit is the MAX_IN_VAIN setting. If I do this:
> > 
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 1734a573b0..016a413d49 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -46,7 +46,7 @@ static struct strbuf fsck_msg_types = STRBUF_INIT;
> >   * After sending this many "have"s if we do not get any new ACK , we
> >   * give up traversing our history.
> >   */
> > -#define MAX_IN_VAIN 256
> > +#define MAX_IN_VAIN 20000
> >  
> >  static int multi_ack, use_sideband;
> >  /* Allow specifying sha1 if it is a ref tip. */
> > 
> > then I get that same 48k objects, 23MB fetch that v0 does.
> 
> I don't quite think that's the solution, though. Both old and new are
> supposed to be respecting MAX_IN_VAIN. So it's not at all clear to me
> why it restricts the number of haves we'll send in v2, but not in v0.
> 
> Maybe somebody more familiar with the negotiation code can comment
> further.

Thanks for the reproduction recipe (in [1]) and your analysis. I took a
look, and it's because the check for in_vain is done differently. In v0:

  if (got_continue && MAX_IN_VAIN < in_vain) {

reflecting the documentation in pack-protocol.txt:

  However, the 256 limit *only* turns on in the canonical client
  implementation if we have received at least one "ACK %s continue"
  during a prior round.  This helps to ensure that at least one common
  ancestor is found before we give up entirely.

(Note that both the code and the documentation call it "continue", but
the code also correctly handles multi_ack_detailed, which instructs the
server to send "ACK common" and "ACK ready" in lieu of "ACK continue".)

When debugging, I noticed that in_vain was increasing far in excess of
MAX_IN_VAIN, but because got_continue was false, the client did not give
up.

But in v2:

  if (!haves_added || *in_vain >= MAX_IN_VAIN) {

("haves_added" is irrelevant to this discussion. It is another
termination condition - when we have run out of "have"s to send.)

So there is no check that "continue" was sent. We probably should change
v2 to match v0. I can start writing a patch unless someone else would
like to take a further look at it.

[1] https://lore.kernel.org/git/20200422095702.GA475060@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-23 21:37       ` Jonathan Tan
@ 2020-04-23 21:54         ` Junio C Hamano
  2020-04-24  5:32         ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2020-04-23 21:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: peff, lkundrak, jrnieder, git

Jonathan Tan <jonathantanmy@google.com> writes:

> But in v2:
>
>   if (!haves_added || *in_vain >= MAX_IN_VAIN) {
>
> ("haves_added" is irrelevant to this discussion. It is another
> termination condition - when we have run out of "have"s to send.)
>
> So there is no check that "continue" was sent. We probably should change
> v2 to match v0.

That sounds like a good change.

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
  2020-04-23 21:37       ` Jonathan Tan
  2020-04-23 21:54         ` Junio C Hamano
@ 2020-04-24  5:32         ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2020-04-24  5:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, lkundrak, jrnieder, git

On Thu, Apr 23, 2020 at 02:37:35PM -0700, Jonathan Tan wrote:

> Thanks for the reproduction recipe (in [1]) and your analysis. I took a
> look, and it's because the check for in_vain is done differently. In v0:
> 
>   if (got_continue && MAX_IN_VAIN < in_vain) {
> 
> reflecting the documentation in pack-protocol.txt:
> 
>   However, the 256 limit *only* turns on in the canonical client
>   implementation if we have received at least one "ACK %s continue"
>   during a prior round.  This helps to ensure that at least one common
>   ancestor is found before we give up entirely.

Ah, thanks for that; I hadn't though to look in that file for more
clues.

> When debugging, I noticed that in_vain was increasing far in excess of
> MAX_IN_VAIN, but because got_continue was false, the client did not give
> up.
> 
> But in v2:
> 
>   if (!haves_added || *in_vain >= MAX_IN_VAIN) {
> 
> ("haves_added" is irrelevant to this discussion. It is another
> termination condition - when we have run out of "have"s to send.)
> 
> So there is no check that "continue" was sent. We probably should change
> v2 to match v0. I can start writing a patch unless someone else would
> like to take a further look at it.

Yeah, this fills in the final pieces of the puzzle I was chasing in:

 https://lore.kernel.org/git/20200422193324.GB558336@coredump.intra.peff.net/

And the patch you suggest sounds like the best solution.

I think there's some room for discussion about what the optimal
strategies are (e.g., v0 does send a lot more haves than v2 in this
instance, and it wouldn't always be helpful). But it makes sense to me
to put v2 and v0 on the same footing for now, especially given the
regressions people have mentioned, and then we can explore new options
at our convenience (like switching on the skipping negotiation
algorithm).

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-04-24  5:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22  8:42 Git 2.26 fetches many times more objects than it should, wasting gigabytes Lubomir Rintel
2020-04-22  9:57 ` Jeff King
2020-04-22 10:30   ` Jeff King
2020-04-22 10:40     ` Jeff King
2020-04-22 15:33       ` Junio C Hamano
2020-04-22 19:33         ` Jeff King
2020-04-23 21:37       ` Jonathan Tan
2020-04-23 21:54         ` Junio C Hamano
2020-04-24  5:32         ` Jeff King
2020-04-22 15:40   ` Jonathan Nieder
2020-04-22 19:36     ` Jeff King
2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
2020-04-22 18:23     ` Junio C Hamano
2020-04-22 19:40     ` Jeff King
2020-04-22 19:47       ` Jeff King
2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
2020-04-22 17:32     ` Junio C Hamano
2020-04-22 19:18     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).