git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* cherry-pick very slow on big repository
@ 2017-11-10  9:39 Peter Krefting
  2017-11-10 10:20 ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Krefting @ 2017-11-10  9:39 UTC (permalink / raw)
  To: Git Mailing List

Hi!

On a big repository (57000 files, 2,5 gigabytes in .git/objects), git 
cherry-pick is very slow for me (v2.15.0). This is cherry-picking a 
one-file change, where the file is in the same place on both branches, 
and which applies cleanly (I am backporting a few fixes to a 
maintenance version):

$ time git cherry-pick -x 717eb328940ca2e33f14ed27576e656327854b7b
[redacted 391454f16d] Redacted
  Author: Redacted <redacted>
  Date: Mon Oct 16 15:58:05 2017 +0200
  1 file changed, 2 insertions(+), 2 deletions(-)

real    6m9,054s
user    5m49,432s
sys     0m2,292s

Something is not how it should be here. The repo shares objects 
(.git/objects/info/alternates) with another repository (I have run 
"git gc" on both repositories).

Running strace, it seems like it is doing lstat(), open(), mmap(), 
close() and munmap() on every single file in the repository, which 
takes a lot of time.

I thought it was just updating the status, but "git status" returns 
immediately, while cherry-picking takes several minutes for every 
cherry-pick I do.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10  9:39 cherry-pick very slow on big repository Peter Krefting
@ 2017-11-10 10:20 ` Jeff King
  2017-11-10 12:37   ` Peter Krefting
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2017-11-10 10:20 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Git Mailing List

On Fri, Nov 10, 2017 at 10:39:39AM +0100, Peter Krefting wrote:

> Running strace, it seems like it is doing lstat(), open(), mmap(), close()
> and munmap() on every single file in the repository, which takes a lot of
> time.
> 
> I thought it was just updating the status, but "git status" returns
> immediately, while cherry-picking takes several minutes for every
> cherry-pick I do.

It kind of sounds like a temporary index is being refreshed that doesn't
have the proper stat information.

Can you get a backtrace? I'd do something like:

  - gdb --args git cherry-pick ...
  - 'r' to run
  - give it a few seconds to hit the CPU heavy part, then ^C
  - 'bt' to generate the backtrace

which should give a sense of which code path is leading to the slowdown
(or of course use real profiling tools, but if the slow path is taking 6
minutes, you'll be likely to stop in the middle of it ;) ).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10 10:20 ` Jeff King
@ 2017-11-10 12:37   ` Peter Krefting
  2017-11-10 12:59     ` Derrick Stolee
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Krefting @ 2017-11-10 12:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

Jeff King:

> Can you get a backtrace? I'd do something like:

Seems that it spends most time in diffcore_count_changes(), that is 
where it hits whenever I hit Ctrl+C (various line numbers 199-207 in 
diffcore-delta.c; this is on the v2.15.0 tag).

(gdb) bt
#0  diffcore_count_changes (src=src@entry=0x55555db99970,
     dst=dst@entry=0x55555d6a4810,
     src_count_p=src_count_p@entry=0x55555db99998,
     dst_count_p=dst_count_p@entry=0x55555d6a4838,
     src_copied=src_copied@entry=0x7fffffffd3e0,
     literal_added=literal_added@entry=0x7fffffffd3f0)
     at diffcore-delta.c:203
#1  0x00005555556dee1a in estimate_similarity (minimum_score=30000,
     dst=0x55555d6a4810, src=0x55555db99970) at diffcore-rename.c:193
#2  diffcore_rename (options=options@entry=0x7fffffffd4f0)
     at diffcore-rename.c:560
#3  0x0000555555623d83 in diffcore_std (
     options=options@entry=0x7fffffffd4f0) at diff.c:5846
#4  0x000055555564ab46 in get_renames (o=o@entry=0x7fffffffd850,
     tree=tree@entry=0x5555559d1b98,
     o_tree=o_tree@entry=0x5555559d1bc0,
     a_tree=a_tree@entry=0x5555559d1b98,
     b_tree=b_tree@entry=0x5555559d1b70,
     entries=entries@entry=0x555559351d20) at merge-recursive.c:554
#5  0x000055555564e7d9 in merge_trees (o=o@entry=0x7fffffffd850,
     head=head@entry=0x5555559d1b98, merge=<optimized out>,
     merge@entry=0x5555559d1b70, common=<optimized out>,
     common@entry=0x5555559d1bc0, result=result@entry=0x7fffffffd830)
     at merge-recursive.c:1985
#6  0x000055555569b2cc in do_recursive_merge (opts=0x7fffffffdf70,
     msgbuf=0x7fffffffd810, head=0x7fffffffd7f0,
     next_label=<optimized out>, base_label=<optimized out>,
     next=<optimized out>, base=0x5555559c1ba0) at sequencer.c:459
#7  do_pick_commit (command=TODO_PICK,
     commit=commit@entry=0x5555559c1b60,
     opts=opts@entry=0x7fffffffdf70, final_fixup=final_fixup@entry=0)
     at sequencer.c:1088
#8  0x000055555569e324 in single_pick (opts=0x7fffffffdf70,
     cmit=0x5555559c1b60) at sequencer.c:2306
#9  sequencer_pick_revisions (opts=0x7fffffffdf70)
     at sequencer.c:2355
#10 0x00005555555d4097 in run_sequencer (argc=1, argc@entry=3,
     argv=argv@entry=0x7fffffffe320, opts=<optimized out>,
     opts@entry=0x7fffffffdf70) at builtin/revert.c:200
#11 0x00005555555d449a in cmd_cherry_pick (argc=3,
     argv=0x7fffffffe320, prefix=<optimized out>)
     at builtin/revert.c:225
#12 0x0000555555567a38 in run_builtin (argv=<optimized out>,
     argc=<optimized out>, p=<optimized out>) at git.c:346
#13 handle_builtin (argc=3, argv=0x7fffffffe320) at git.c:554
#14 0x0000555555567cf6 in run_argv (argv=0x7fffffffe0e0,
     argcp=0x7fffffffe0ec) at git.c:606
#15 cmd_main (argc=<optimized out>, argv=<optimized out>)
     at git.c:683
#16 0x0000555555566e01 in main (argc=4, argv=0x7fffffffe318)
     at common-main.c:43

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10 12:37   ` Peter Krefting
@ 2017-11-10 12:59     ` Derrick Stolee
  2017-11-10 14:05       ` Peter Krefting
  0 siblings, 1 reply; 13+ messages in thread
From: Derrick Stolee @ 2017-11-10 12:59 UTC (permalink / raw)
  To: Peter Krefting, Jeff King; +Cc: Git Mailing List

On 11/10/2017 7:37 AM, Peter Krefting wrote:
> Jeff King:
>
>> Can you get a backtrace? I'd do something like:
>
> Seems that it spends most time in diffcore_count_changes(), that is 
> where it hits whenever I hit Ctrl+C (various line numbers 199-207 in 
> diffcore-delta.c; this is on the v2.15.0 tag).
>
> (gdb) bt
> #0  diffcore_count_changes (src=src@entry=0x55555db99970,
>     dst=dst@entry=0x55555d6a4810,
>     src_count_p=src_count_p@entry=0x55555db99998,
>     dst_count_p=dst_count_p@entry=0x55555d6a4838,
>     src_copied=src_copied@entry=0x7fffffffd3e0,
>     literal_added=literal_added@entry=0x7fffffffd3f0)
>     at diffcore-delta.c:203
> #1  0x00005555556dee1a in estimate_similarity (minimum_score=30000,
>     dst=0x55555d6a4810, src=0x55555db99970) at diffcore-rename.c:193
> #2  diffcore_rename (options=options@entry=0x7fffffffd4f0)
>     at diffcore-rename.c:560
> #3  0x0000555555623d83 in diffcore_std (
>     options=options@entry=0x7fffffffd4f0) at diff.c:5846
> ...

Git is spending time detecting renames, which implies you probably 
renamed a folder or added and deleted a large number of files. This 
rename detection is quadratic (# adds times # deletes).

You can remove this rename detection by running your cherry-pick with 
`git -c diff.renameLimit=1 cherry-pick ...`

See https://git-scm.com/docs/diff-config#diff-config-diffrenameLimit

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10 12:59     ` Derrick Stolee
@ 2017-11-10 14:05       ` Peter Krefting
  2017-11-10 17:04         ` Kevin Willford
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Peter Krefting @ 2017-11-10 14:05 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, Git Mailing List

Derrick Stolee:

> Git is spending time detecting renames, which implies you probably 
> renamed a folder or added and deleted a large number of files. This 
> rename detection is quadratic (# adds times # deletes).

Yes, a couple of directories with a lot of template files have been 
renamed (and some removed, some added) between the current development 
branch and this old maintenance branch. I get the "Performing inexact 
rename detection" a lot when merging changes in the other direction.

However, none of them applies to these particular commits, which only 
touches files that are in the exact same location on both branches.

> You can remove this rename detection by running your cherry-pick 
> with `git -c diff.renameLimit=1 cherry-pick ...`

That didn't work, actually it failed to finish with this setting in 
effect, it hangs in such a way that I can't stop it with Ctrl+C 
(neither when running from the command line, nor when running inside 
gdb). It didn't finish in the 20 minutes I gave it.

I also tried with diff.renames=false, which also seemed to fail.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: cherry-pick very slow on big repository
  2017-11-10 14:05       ` Peter Krefting
@ 2017-11-10 17:04         ` Kevin Willford
  2017-11-13 11:19           ` Peter Krefting
  2017-11-10 17:37         ` Elijah Newren
  2017-11-10 23:32         ` Elijah Newren
  2 siblings, 1 reply; 13+ messages in thread
From: Kevin Willford @ 2017-11-10 17:04 UTC (permalink / raw)
  To: Peter Krefting, Derrick Stolee; +Cc: Jeff King, Git Mailing List

Since this is happening during a merge, you might need to use merge.renameLimit
or the merge strategy option of -Xno-renames.  Although the code does fallback
to use the diff.renameLimit but there is still a lot that is done before even checking
the rename limit so I would first try getting renames turned off.

Thanks,
Kevin

> -----Original Message-----
> From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On Behalf
> Of Peter Krefting
> Sent: Friday, November 10, 2017 7:05 AM
> To: Derrick Stolee <stolee@gmail.com>
> Cc: Jeff King <peff@peff.net>; Git Mailing List <git@vger.kernel.org>
> Subject: Re: cherry-pick very slow on big repository
> 
> Derrick Stolee:
> 
> > Git is spending time detecting renames, which implies you probably
> > renamed a folder or added and deleted a large number of files. This
> > rename detection is quadratic (# adds times # deletes).
> 
> Yes, a couple of directories with a lot of template files have been
> renamed (and some removed, some added) between the current development
> branch and this old maintenance branch. I get the "Performing inexact
> rename detection" a lot when merging changes in the other direction.
> 
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.
> 
> > You can remove this rename detection by running your cherry-pick
> > with `git -c diff.renameLimit=1 cherry-pick ...`
> 
> That didn't work, actually it failed to finish with this setting in
> effect, it hangs in such a way that I can't stop it with Ctrl+C
> (neither when running from the command line, nor when running inside
> gdb). It didn't finish in the 20 minutes I gave it.
> 
> I also tried with diff.renames=false, which also seemed to fail.
> 
> --
> \\// Peter -
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.softw
> olves.pp.se%2F&data=02%7C01%7Ckewillf%40microsoft.com%7C6b831a75739e4
> 0428d3808d52844106c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636
> 459195209466999&sdata=kJtNLAs1LSoPy%2B%2BNADJkuEBPMZVcxkSkKzOEEeIG
> VpM%3D&reserved=0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10 14:05       ` Peter Krefting
  2017-11-10 17:04         ` Kevin Willford
@ 2017-11-10 17:37         ` Elijah Newren
  2017-11-10 23:32         ` Elijah Newren
  2 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2017-11-10 17:37 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Derrick Stolee, Jeff King, Git Mailing List

Interesting timing.  I have some performance patches specifically
developed because rename detection during merges made a small
cherry-pick in a large repo rather slow...in my case, I dropped the
time for the cherry pick by a factor of about 30 (no guarantees you'll
see the same; it's very history-specific).  I was just about to start
sending my three series of patches, the performance one being the
third...

On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting <peter@softwolves.pp.se> wrote:
> Derrick Stolee:
>
>> Git is spending time detecting renames, which implies you probably renamed
>> a folder or added and deleted a large number of files. This rename detection
>> is quadratic (# adds times # deletes).
>
>
> Yes, a couple of directories with a lot of template files have been renamed
> (and some removed, some added) between the current development branch and
> this old maintenance branch. I get the "Performing inexact rename detection"
> a lot when merging changes in the other direction.
>
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.
>
>> You can remove this rename detection by running your cherry-pick with `git
>> -c diff.renameLimit=1 cherry-pick ...`
>
>
> That didn't work, actually it failed to finish with this setting in effect,
> it hangs in such a way that I can't stop it with Ctrl+C (neither when
> running from the command line, nor when running inside gdb). It didn't
> finish in the 20 minutes I gave it.
>
> I also tried with diff.renames=false, which also seemed to fail.
>
>
> --
> \\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10 14:05       ` Peter Krefting
  2017-11-10 17:04         ` Kevin Willford
  2017-11-10 17:37         ` Elijah Newren
@ 2017-11-10 23:32         ` Elijah Newren
  2017-11-13 11:22           ` Peter Krefting
  2 siblings, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2017-11-10 23:32 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Derrick Stolee, Jeff King, Git Mailing List

On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting <peter@softwolves.pp.se> wrote:
> Derrick Stolee:
>
>> Git is spending time detecting renames, which implies you probably renamed
>> a folder or added and deleted a large number of files. This rename detection
>> is quadratic (# adds times # deletes).
>
> Yes, a couple of directories with a lot of template files have been renamed
> (and some removed, some added) between the current development branch and
> this old maintenance branch. I get the "Performing inexact rename detection"
> a lot when merging changes in the other direction.
>
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.

I would be very interested to hear how my rename detection performance
patches work for you; this kind of usecase was the exact one it was
designed to help the most.  See
https://public-inbox.org/git/20171110222156.23221-1-newren@gmail.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: cherry-pick very slow on big repository
  2017-11-10 17:04         ` Kevin Willford
@ 2017-11-13 11:19           ` Peter Krefting
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Krefting @ 2017-11-13 11:19 UTC (permalink / raw)
  To: Kevin Willford; +Cc: Derrick Stolee, Jeff King, Git Mailing List

Kevin Willford:

> Since this is happening during a merge, you might need to use merge.renameLimit
> or the merge strategy option of -Xno-renames.  Although the code does fallback
> to use the diff.renameLimit but there is still a lot that is done before even checking
> the rename limit so I would first try getting renames turned off.

That makes quite a large difference, with this setting it finishes in 
just a few seconds:

   $ time git -c merge.renameLimit=1 cherry-pick -x 717eb328940ca2e33f14ed27576e656327854b7b
   [redacted 0576fbaf89] Redacted
    Author: Redacted <redacted>
    Date: Mon Oct 16 15:58:05 2017 +0200
    1 file changed, 2 insertions(+), 2 deletions(-)

   real    0m15,473s
   user    0m14,904s
   sys     0m0,488s

I'll add this setting for the repository for the future, thank you!

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-10 23:32         ` Elijah Newren
@ 2017-11-13 11:22           ` Peter Krefting
  2017-11-13 18:09             ` Elijah Newren
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Krefting @ 2017-11-13 11:22 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Derrick Stolee, Jeff King, Git Mailing List

Elijah Newren:

> I would be very interested to hear how my rename detection 
> performance patches work for you; this kind of usecase was the exact 
> one it was designed to help the most.  See 
> https://public-inbox.org/git/20171110222156.23221-1-newren@gmail.com/

I'd be happy to try them out. Is there a public repo where I can pull 
these patches from instead of trying to apply them manually, as there 
are several patch series involved here?

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-13 11:22           ` Peter Krefting
@ 2017-11-13 18:09             ` Elijah Newren
  2017-11-21 12:07               ` Peter Krefting
  0 siblings, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2017-11-13 18:09 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Derrick Stolee, Jeff King, Git Mailing List

On Mon, Nov 13, 2017 at 3:22 AM, Peter Krefting <peter@softwolves.pp.se> wrote:
> Elijah Newren:
>
>> I would be very interested to hear how my rename detection performance
>> patches work for you; this kind of usecase was the exact one it was designed
>> to help the most.  See
>> https://public-inbox.org/git/20171110222156.23221-1-newren@gmail.com/
>
> I'd be happy to try them out. Is there a public repo where I can pull these
> patches from instead of trying to apply them manually, as there are several
> patch series involved here?

Sure, take a look at the big-repo-small-cherry-pick branch of
https://github.com/newren/git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-13 18:09             ` Elijah Newren
@ 2017-11-21 12:07               ` Peter Krefting
  2017-11-21 17:14                 ` Elijah Newren
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Krefting @ 2017-11-21 12:07 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Derrick Stolee, Jeff King, Git Mailing List

Elijah Newren:

> Sure, take a look at the big-repo-small-cherry-pick branch of
> https://github.com/newren/git

With those changes, the time usage is the same as if I set 
merge.renameLimit=1 for the repository, and the end result is identical:

$ time /usr/local/stow/git-v2.15.0-323-g31fe956618/bin/git cherry-pick 
-x 717eb328940ca2e33f14ed27576e656327854b7b
[redacted 19be3551bc] Redacted
  Author: Redacted <redacted>
  Date: Mon Oct 16 15:58:05 2017 +0200
  1 file changed, 2 insertions(+), 2 deletions(-)

real    0m15,345s
user    0m14,908s
sys     0m0,528s

Thanks!

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: cherry-pick very slow on big repository
  2017-11-21 12:07               ` Peter Krefting
@ 2017-11-21 17:14                 ` Elijah Newren
  0 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2017-11-21 17:14 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Derrick Stolee, Jeff King, Git Mailing List

On Tue, Nov 21, 2017 at 4:07 AM, Peter Krefting <peter@softwolves.pp.se> wrote:
> Elijah Newren:
>
>> Sure, take a look at the big-repo-small-cherry-pick branch of
>> https://github.com/newren/git
>
>
> With those changes, the time usage is the same as if I set
> merge.renameLimit=1 for the repository, and the end result is identical:
>
> $ time /usr/local/stow/git-v2.15.0-323-g31fe956618/bin/git cherry-pick -x
> 717eb328940ca2e33f14ed27576e656327854b7b
> [redacted 19be3551bc] Redacted
>  Author: Redacted <redacted>
>  Date: Mon Oct 16 15:58:05 2017 +0200
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> real    0m15,345s
> user    0m14,908s
> sys     0m0,528s
>
> Thanks!


Cool, glad it worked for you.  Thanks for testing it out.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-21 17:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10  9:39 cherry-pick very slow on big repository Peter Krefting
2017-11-10 10:20 ` Jeff King
2017-11-10 12:37   ` Peter Krefting
2017-11-10 12:59     ` Derrick Stolee
2017-11-10 14:05       ` Peter Krefting
2017-11-10 17:04         ` Kevin Willford
2017-11-13 11:19           ` Peter Krefting
2017-11-10 17:37         ` Elijah Newren
2017-11-10 23:32         ` Elijah Newren
2017-11-13 11:22           ` Peter Krefting
2017-11-13 18:09             ` Elijah Newren
2017-11-21 12:07               ` Peter Krefting
2017-11-21 17:14                 ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).