* Fetching new refs gets progressively slower @ 2017-07-09 10:25 s 2017-07-09 11:29 ` Jeff King 0 siblings, 1 reply; 6+ messages in thread From: s @ 2017-07-09 10:25 UTC (permalink / raw) To: git I have a weird issue where fetching a large number of refs will start off with lines like these: * [new ref] refs/pull/10000/head -> origin/pr/10000 * [new ref] refs/pull/10001/head -> origin/pr/10001 going fairly fast, and then progressively getting slower and slower. By the time git is working on 40 thousandth such ref, it seems like it is only handling about 3-5 such “new ref”s. These are the steps I used to reproduce: $ git clone git@github.com:rust-lang/rust $ # edit .git/config to add $ # `fetch = +refs/pull/*/head:refs/remotes/origin/pr/*` under origin remote $ git fetch I tried this on three distinct file systems: zfs, ext4 and tmpfs, and on two distinct systems (both with different SSDs in it). Both exhibit the approximately the same behaviour. Both on a fairly recent version of Linux. Here’s some timings: System 1 (ext4) 97% (1047.74 real, 700.11 kernel, 319.89 user); 599476k resident System 1 (tmpf) 97% (963.78 real, 647.51 kernel, 292.77 user); 600052k resident System 1 (zfs) 98% (2116.66 real, 1715.86 kernel, 370.32 user); 531232k resident System 2 (ext4) 97% (1036.56 real, 710.54 kernel, 300.81 user); 602160k resident Git version is same on both systems: 2.13.1 I did not investigate the issue more throughoutly, but I suspect that git will end up doing something resembling listing the contents of the directory of refs for each “new ref” it is creating. S. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fetching new refs gets progressively slower 2017-07-09 10:25 Fetching new refs gets progressively slower s @ 2017-07-09 11:29 ` Jeff King 2017-08-17 15:12 ` [PATCH] files-backend: cheapen refname_available check when locking refs Michael Haggerty 0 siblings, 1 reply; 6+ messages in thread From: Jeff King @ 2017-07-09 11:29 UTC (permalink / raw) To: s; +Cc: Michael Haggerty, git On Sun, Jul 09, 2017 at 01:25:06PM +0300, s@kazlauskas.me wrote: > I have a weird issue where fetching a large number of refs will start off > with lines like these: > > * [new ref] refs/pull/10000/head -> origin/pr/10000 > * [new ref] refs/pull/10001/head -> origin/pr/10001 > > going fairly fast, and then progressively getting slower and slower. By the > time git is working on 40 thousandth such ref, it seems like it is only > handling about 3-5 such “new ref”s. > > These are the steps I used to reproduce: > > $ git clone git@github.com:rust-lang/rust > $ # edit .git/config to add $ # `fetch = > +refs/pull/*/head:refs/remotes/origin/pr/*` under origin remote > $ git fetch Interesting. The CPU is definitely going to user-space, and it does look like a quadratic case, where we end up re-reading the loose ref directory for each new ref we write. The backtrace looks something like: #1 0x0000555d86bcaf62 in files_read_raw_ref (ref_store=0x555d88d63a60, refname=0x555d8a4791b0 "refs/remotes/origin/pr/10936", sha1=0x7ffd29cbe1a0 "\003ŕX[\212*\037\064xE\fU\362!z44!<]U", referent=0x555d86efc630 <sb_refname>, type=0x7ffd29cbe0c4) at refs/files-backend.c:686 #2 0x0000555d86bc845f in refs_read_raw_ref (ref_store=0x555d88d63a60, refname=0x555d8a4791b0 "refs/remotes/origin/pr/10936", sha1=0x7ffd29cbe1a0 "\003ŕX[\212*\037\064xE\fU\362!z44!<]U", referent=0x555d86efc630 <sb_refname>, type=0x7ffd29cbe0c4) at refs.c:1391 #3 0x0000555d86bc851f in refs_resolve_ref_unsafe (refs=0x555d88d63a60, refname=0x555d8a4791b0 "refs/remotes/origin/pr/10936", resolve_flags=1, sha1=0x7ffd29cbe1a0 "\003ŕX[\212*\037\064xE\fU\362!z44!<]U", flags=0x7ffd29cbe19c) at refs.c:1430 #4 0x0000555d86bca9de in loose_fill_ref_dir (ref_store=0x555d88d63a60, dir=0x555d8a478cc8, dirname=0x555d8a478cf0 "refs/remotes/origin/pr/") at refs/files-backend.c:485 #5 0x0000555d86bd1444 in get_ref_dir (entry=0x555d8a478cc0) at refs/ref-cache.c:28 #6 0x0000555d86bd18d4 in search_for_subdir (dir=0x555d8a478c78, subdirname=0x555d8a414330 "refs/remotes/origin/pr/13832/", len=23, mkdir=0) at refs/ref-cache.c:172 #7 0x0000555d86bd192d in find_containing_dir (dir=0x555d8a478c78, refname=0x555d8a414330 "refs/remotes/origin/pr/13832/", mkdir=0) at refs/ref-cache.c:191 #8 0x0000555d86bd23a4 in cache_ref_iterator_begin (cache=0x555d8a418380, prefix=0x555d8a414330 "refs/remotes/origin/pr/13832/", prime_dir=1) at refs/ref-cache.c:567 #9 0x0000555d86bcb8eb in files_ref_iterator_begin (ref_store=0x555d88d63a60, prefix=0x555d8a414330 "refs/remotes/origin/pr/13832/", flags=1) at refs/files-backend.c:1120 #10 0x0000555d86bc801a in refs_ref_iterator_begin (refs=0x555d88d63a60, prefix=0x555d8a414330 "refs/remotes/origin/pr/13832/", trim=0, flags=1) at refs.c:1268 #11 0x0000555d86bc9289 in refs_verify_refname_available (refs=0x555d88d63a60, refname=0x555d88d64b50 "refs/remotes/origin/pr/13832", extras=0x7ffd29cbe620, skip=0x0, err=0x7ffd29cbe710) at refs.c:1887 #12 0x0000555d86bcb553 in lock_raw_ref (refs=0x555d88d63a60, refname=0x555d88d64b50 "refs/remotes/origin/pr/13832", mustexist=0, extras=0x7ffd29cbe620, skip=0x0, lock_p=0x7ffd29cbe588, referent=0x7ffd29cbe590, type=0x555d88d64b38, err=0x7ffd29cbe710) at refs/files-backend.c:956 #13 0x0000555d86bcf1bc in lock_ref_for_update (refs=0x555d88d63a60, update=0x555d88d64b00, transaction=0x555d88d64d20, head_ref=0x555d8a435630 "refs/heads/master", affected_refnames=0x7ffd29cbe620, err=0x7ffd29cbe710) at refs/files-backend.c:2764 The issue seems recent. Bisecting leads to Michael's 524a9fdb5 (refs_verify_refname_available(): use function in more places, 2017-04-16), which makes sense. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] files-backend: cheapen refname_available check when locking refs 2017-07-09 11:29 ` Jeff King @ 2017-08-17 15:12 ` Michael Haggerty 2017-08-17 15:22 ` Jeff King 0 siblings, 1 reply; 6+ messages in thread From: Michael Haggerty @ 2017-08-17 15:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: s, Jeff King, git, Michael Haggerty When locking references in preparation for updating them, we need to check that none of the newly added references D/F conflict with existing references (e.g., we don't allow `refs/foo` to be added if `refs/foo/bar` already exists, or vice versa). Prior to 524a9fdb51 (refs_verify_refname_available(): use function in more places, 2017-04-16), conflicts with existing loose references were checked by looking directly in the filesystem, and then conflicts with existing packed references were checked by running `verify_refname_available_dir()` against the packed-refs cache. But that commit changed the final check to call `refs_verify_refname_available()` against the *whole* files ref-store, including both loose and packed references, with the following comment: > This means that those callsites now check for conflicts with all > references rather than just packed refs, but the performance cost > shouldn't be significant (and will be regained later). That comment turned out to be too sanguine. User s@kazlauskas.me reported that fetches involving a very large number of references in neighboring directories were slowed down by that change. The problem is that when fetching, each reference is updated individually, within its own reference transaction. This is done because some reference updates might succeed even though others fail. But every time a reference update transaction is finished, `clear_loose_ref_cache()` is called. So when it is time to update the next reference, part of the loose ref cache has to be repopulated for the `refs_verify_refname_available()` call. If the references are all in neighboring directories, then the cost of repopulating the reference cache increases with the number of references, resulting in O(N²) effort. The comment above also claims that the performance cost "will be regained later". The idea was that once the packed-refs were finished being split out into a separate ref-store, we could limit the `refs_verify_refname_available()` call to the packed references again. That is what we do now. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- This patch applies on top of branch mh/packed-ref-store. It can also be obtained from my fork [1] as branch "faster-refname-available-check". I was testing this using the reporter's recipe (but fetching from a local clone), and found the following surprising timing numbers: b05855b5bc (before the slowdown): 22.7 s 524a9fdb51 (immediately after the slowdown): 13 minutes 4e81f1ecf1 (after this fix): 14.5 s The fact that the fetch is now significantly *faster* than before the slowdown seems not to have anything to do with the reference code. refs/files-backend.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index e9b95592b6..f2a420c611 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, /* * If the ref did not exist and we are creating it, - * make sure there is no existing ref that conflicts - * with refname: + * make sure there is no existing packed ref that + * conflicts with refname: */ if (refs_verify_refname_available( - &refs->base, refname, + refs->packed_ref_store, refname, extras, skip, err)) goto error_return; } @@ -938,7 +938,7 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs, * our refname. */ if (is_null_oid(&lock->old_oid) && - refs_verify_refname_available(&refs->base, refname, + refs_verify_refname_available(refs->packed_ref_store, refname, extras, skip, err)) { last_errno = ENOTDIR; goto error_return; -- 2.11.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] files-backend: cheapen refname_available check when locking refs 2017-08-17 15:12 ` [PATCH] files-backend: cheapen refname_available check when locking refs Michael Haggerty @ 2017-08-17 15:22 ` Jeff King 2017-08-17 17:56 ` Brandon Williams 2017-08-17 21:37 ` Junio C Hamano 0 siblings, 2 replies; 6+ messages in thread From: Jeff King @ 2017-08-17 15:22 UTC (permalink / raw) To: Michael Haggerty; +Cc: Brandon Williams, Junio C Hamano, s, git On Thu, Aug 17, 2017 at 05:12:50PM +0200, Michael Haggerty wrote: > I was testing this using the reporter's recipe (but fetching from a > local clone), and found the following surprising timing numbers: > > b05855b5bc (before the slowdown): 22.7 s > 524a9fdb51 (immediately after the slowdown): 13 minutes > 4e81f1ecf1 (after this fix): 14.5 s > > The fact that the fetch is now significantly *faster* than before the > slowdown seems not to have anything to do with the reference code. I bisected this (with some hackery, since the commits in the middle all take 13 minutes to run). The other speedup is indeed unrelated, and is due to Brandon's aacc5c1a81 (submodule: refactor logic to determine changed submodules, 2017-05-01). The commit message doesn't mention performance (it's mostly about code reduction). I think the speedup comes from using diff_tree_combined_merge() instead of manually diffing each commit against its parents. But I didn't do further timings to verify that (I'm reporting it here mostly as an interesting curiosity for submodule folks). > diff --git a/refs/files-backend.c b/refs/files-backend.c > index e9b95592b6..f2a420c611 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, > > /* > * If the ref did not exist and we are creating it, > - * make sure there is no existing ref that conflicts > - * with refname: > + * make sure there is no existing packed ref that > + * conflicts with refname: > */ > if (refs_verify_refname_available( > - &refs->base, refname, > + refs->packed_ref_store, refname, > extras, skip, err)) > goto error_return; > } This seems too easy to be true. :) But I think it matches what we were doing before 524a9fdb51 (so it's correct), and the performance numbers don't lie. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] files-backend: cheapen refname_available check when locking refs 2017-08-17 15:22 ` Jeff King @ 2017-08-17 17:56 ` Brandon Williams 2017-08-17 21:37 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Brandon Williams @ 2017-08-17 17:56 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, Junio C Hamano, s, git On 08/17, Jeff King wrote: > On Thu, Aug 17, 2017 at 05:12:50PM +0200, Michael Haggerty wrote: > > > I was testing this using the reporter's recipe (but fetching from a > > local clone), and found the following surprising timing numbers: > > > > b05855b5bc (before the slowdown): 22.7 s > > 524a9fdb51 (immediately after the slowdown): 13 minutes > > 4e81f1ecf1 (after this fix): 14.5 s > > > > The fact that the fetch is now significantly *faster* than before the > > slowdown seems not to have anything to do with the reference code. > > I bisected this (with some hackery, since the commits in the middle all > take 13 minutes to run). The other speedup is indeed unrelated, and is > due to Brandon's aacc5c1a81 (submodule: refactor logic to determine > changed submodules, 2017-05-01). > > The commit message doesn't mention performance (it's mostly about code > reduction). I think the speedup comes from using > diff_tree_combined_merge() instead of manually diffing each commit > against its parents. But I didn't do further timings to verify that (I'm > reporting it here mostly as an interesting curiosity for submodule > folks). Haha always great to see an unintended improvement in performance! Yeah that commit was mostly about removing duplicate code but I'm glad that it ended up being a benefit to perf too. > > > diff --git a/refs/files-backend.c b/refs/files-backend.c > > index e9b95592b6..f2a420c611 100644 > > --- a/refs/files-backend.c > > +++ b/refs/files-backend.c > > @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, > > > > /* > > * If the ref did not exist and we are creating it, > > - * make sure there is no existing ref that conflicts > > - * with refname: > > + * make sure there is no existing packed ref that > > + * conflicts with refname: > > */ > > if (refs_verify_refname_available( > > - &refs->base, refname, > > + refs->packed_ref_store, refname, > > extras, skip, err)) > > goto error_return; > > } > > This seems too easy to be true. :) But I think it matches what we were > doing before 524a9fdb51 (so it's correct), and the performance numbers > don't lie. > > -Peff -- Brandon Williams ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] files-backend: cheapen refname_available check when locking refs 2017-08-17 15:22 ` Jeff King 2017-08-17 17:56 ` Brandon Williams @ 2017-08-17 21:37 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2017-08-17 21:37 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, Brandon Williams, s, git Jeff King <peff@peff.net> writes: >> diff --git a/refs/files-backend.c b/refs/files-backend.c >> index e9b95592b6..f2a420c611 100644 >> --- a/refs/files-backend.c >> +++ b/refs/files-backend.c >> @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, >> >> /* >> * If the ref did not exist and we are creating it, >> - * make sure there is no existing ref that conflicts >> - * with refname: >> + * make sure there is no existing packed ref that >> + * conflicts with refname: >> */ >> if (refs_verify_refname_available( >> - &refs->base, refname, >> + refs->packed_ref_store, refname, >> extras, skip, err)) >> goto error_return; >> } > > This seems too easy to be true. :) But I think it matches what we were > doing before 524a9fdb51 (so it's correct), and the performance numbers > don't lie. Thanks, all. The log message explained the change very well, even though I agree that the patch text does indeed look too easy to be true ;-). Will queue. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-08-17 21:38 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-09 10:25 Fetching new refs gets progressively slower s 2017-07-09 11:29 ` Jeff King 2017-08-17 15:12 ` [PATCH] files-backend: cheapen refname_available check when locking refs Michael Haggerty 2017-08-17 15:22 ` Jeff King 2017-08-17 17:56 ` Brandon Williams 2017-08-17 21:37 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).