git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] Speed up mirror-fetches with many refs
@ 2021-08-20 10:08 Patrick Steinhardt
  2021-08-20 10:08 ` [PATCH 1/6] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
                   ` (9 more replies)
  0 siblings, 10 replies; 48+ messages in thread
From: Patrick Steinhardt @ 2021-08-20 10:08 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Ævar Arnfjörð Bjarmason, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2864 bytes --]

Hi,

I've taken another look at fetches in the context of repos with a huge
amount of refs. This time around, I've taken a look at mirror fetches:
in our notorious repo with 2.3M refs, these mirror fetches can take up
to several minutes of time even if there are no changes at all.

As it turns out, many of the issues are again caused by loading and
dereferencing refs. This patch series thus mostly focusses on optimizing
the patterns there, where the biggest win is to opportunistically load
refs via commit-graphs. The following numbers were all calculated for a
mirror-fetch of above 2.3M refs repo on the local disk:

    - Patch 1 speeds up the way we look up commits when appending to
      FETCH_HEAD via the commit-graph, resulting in a ~40% speedup.

    - Patch 2 optimizes the way we check for object existence for a 7%
      speedup.

    - Patch 3 is a cleanup patch which changes the iterator functions
      passed to our connectivity checks. I was hoping for a speedup
      given that we can now avoid copying objects (which could have an
      effect with 2.3M copied OIDs), but unfortunately it didn't. In any
      case, I still think that the end result is much cleaner.

    - Patch 4 optimizes git-fetch-pack(1) to use the commit-graph. This
      is a small win of about ~2%. It's debatable whether this patch is
      worth it.

    - Patch 5 is a preparatory commit which refactors `fetch_refs()` to
      be more readily extendable.

    - Patch 6 optimizes an edge case where we're doing two connectivity
      checks even if the first connectivity check noticed we already had
      all objects locally available, skipping the fetch. This brings a
      15% speedup.

In combination with my previous optimizations for git-fetch-pack(1) and
the connectivity check, this improves performance from 71s
(ps/fetch-pack-load-refs-optim), to 54s (ps/connectivity-optim) to 26s
(this series).

Note that this series depends on ps/connectivity-optim and thus only
applies on top of next.

Patrick

[1]: <08519b8ab6f395cffbcd5e530bfba6aaf64241a2.1628085347.git.ps@pks.im>


Patrick Steinhardt (6):
  fetch: speed up lookup of want refs via commit-graph
  fetch: avoid unpacking headers in object existence check
  connected: refactor iterator to return next object ID directly
  fetch-pack: optimize loading of refs via commit graph
  fetch: refactor fetch refs to be more extendable
  fetch: avoid second connectivity check if we already have all objects

 builtin/clone.c        |  8 ++--
 builtin/fetch.c        | 84 +++++++++++++++++++++++-------------------
 builtin/receive-pack.c | 17 ++++-----
 connected.c            | 15 ++++----
 connected.h            |  2 +-
 fetch-pack.c           | 14 ++++---
 6 files changed, 74 insertions(+), 66 deletions(-)

-- 
2.33.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2021-09-08  0:09 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 10:08 [PATCH 0/6] Speed up mirror-fetches with many refs Patrick Steinhardt
2021-08-20 10:08 ` [PATCH 1/6] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
2021-08-20 14:27   ` Derrick Stolee
2021-08-20 17:18     ` Junio C Hamano
2021-08-23  6:46       ` Patrick Steinhardt
2021-08-25 14:12         ` Derrick Stolee
2021-08-20 10:08 ` [PATCH 2/6] fetch: avoid unpacking headers in object existence check Patrick Steinhardt
2021-08-25 23:44   ` Ævar Arnfjörð Bjarmason
2021-08-20 10:08 ` [PATCH 3/6] connected: refactor iterator to return next object ID directly Patrick Steinhardt
2021-08-20 14:32   ` Derrick Stolee
2021-08-20 17:43     ` Junio C Hamano
2021-08-20 17:43   ` René Scharfe
2021-08-23  6:47     ` Patrick Steinhardt
2021-08-20 10:08 ` [PATCH 4/6] fetch-pack: optimize loading of refs via commit graph Patrick Steinhardt
2021-08-20 14:37   ` Derrick Stolee
2021-08-20 10:08 ` [PATCH 5/6] fetch: refactor fetch refs to be more extendable Patrick Steinhardt
2021-08-20 14:41   ` Derrick Stolee
2021-08-20 10:08 ` [PATCH 6/6] fetch: avoid second connectivity check if we already have all objects Patrick Steinhardt
2021-08-20 14:47   ` Derrick Stolee
2021-08-23  6:52     ` Patrick Steinhardt
2021-08-20 14:50 ` [PATCH 0/6] Speed up mirror-fetches with many refs Derrick Stolee
2021-08-21  0:09 ` Junio C Hamano
2021-08-24 10:36 ` [PATCH v2 0/7] " Patrick Steinhardt
2021-08-24 10:36   ` [PATCH v2 1/7] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
2021-08-25 14:16     ` Derrick Stolee
2021-08-24 10:37   ` [PATCH v2 2/7] fetch: avoid unpacking headers in object existence check Patrick Steinhardt
2021-08-24 10:37   ` [PATCH v2 3/7] connected: refactor iterator to return next object ID directly Patrick Steinhardt
2021-08-24 10:37   ` [PATCH v2 4/7] fetch-pack: optimize loading of refs via commit graph Patrick Steinhardt
2021-08-24 10:37   ` [PATCH v2 5/7] fetch: refactor fetch refs to be more extendable Patrick Steinhardt
2021-08-25 14:19     ` Derrick Stolee
2021-09-01 12:48       ` Patrick Steinhardt
2021-08-24 10:37   ` [PATCH v2 6/7] fetch: merge fetching and consuming refs Patrick Steinhardt
2021-08-25 14:26     ` Derrick Stolee
2021-09-01 12:49       ` Patrick Steinhardt
2021-08-24 10:37   ` [PATCH v2 7/7] fetch: avoid second connectivity check if we already have all objects Patrick Steinhardt
2021-08-24 22:48   ` [PATCH v2 0/7] Speed up mirror-fetches with many refs Junio C Hamano
2021-08-25  6:04     ` Patrick Steinhardt
2021-08-25 14:27   ` Derrick Stolee
2021-09-01 13:09 ` [PATCH v3 " Patrick Steinhardt
2021-09-01 13:09   ` [PATCH v3 1/7] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
2021-09-01 13:09   ` [PATCH v3 2/7] fetch: avoid unpacking headers in object existence check Patrick Steinhardt
2021-09-01 13:09   ` [PATCH v3 3/7] connected: refactor iterator to return next object ID directly Patrick Steinhardt
2021-09-01 13:09   ` [PATCH v3 4/7] fetch-pack: optimize loading of refs via commit graph Patrick Steinhardt
2021-09-01 13:09   ` [PATCH v3 5/7] fetch: refactor fetch refs to be more extendable Patrick Steinhardt
2021-09-01 13:10   ` [PATCH v3 6/7] fetch: merge fetching and consuming refs Patrick Steinhardt
2021-09-01 13:10   ` [PATCH v3 7/7] fetch: avoid second connectivity check if we already have all objects Patrick Steinhardt
2021-09-01 19:58   ` [PATCH v3 0/7] Speed up mirror-fetches with many refs Junio C Hamano
2021-09-08  0:08     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).