* [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing
@ 2017-04-11 20:08 git
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Version 4 greatly simplifies the p0005 perf test. It now uses an existing
repo -- either real-world or artificial from t/perf/repos/many-files.sh.
Jeff Hostetler (2):
string-list: use ALLOC_GROW macro when reallocing string_list
p0005-status: time status on very large repo
string-list.c | 5 +----
t/perf/p0005-status.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 52 insertions(+), 4 deletions(-)
create mode 100755 t/perf/p0005-status.sh
--
2.9.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list
2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
@ 2017-04-11 20:08 ` git
2017-04-11 21:29 ` Jonathan Nieder
2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
2 siblings, 1 reply; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Use ALLOC_GROW() macro when reallocing a string_list array
rather than simply increasing it by 32. This is a performance
optimization.
During status on a very large repo and there are many changes,
a significant percentage of the total run time is spent
reallocing the wt_status.changes array.
This change decreases the time in wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.
This produced a modest gain on my 1M file artificial repo, but
broke even on linux.git.
Test HEAD^^ HEAD
---------------------------------------------------------------------------------------
0005.2: read-tree status br_ballast (1000001) 8.29(5.62+2.62) 8.22(5.57+2.63) -0.8%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
string-list.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/string-list.c b/string-list.c
index 45016ad..003ca18 100644
--- a/string-list.c
+++ b/string-list.c
@@ -41,10 +41,7 @@ static int add_entry(int insert_at, struct string_list *list, const char *string
if (exact_match)
return -1 - index;
- if (list->nr + 1 >= list->alloc) {
- list->alloc += 32;
- REALLOC_ARRAY(list->items, list->alloc);
- }
+ ALLOC_GROW(list->items, list->nr+1, list->alloc);
if (index < list->nr)
memmove(list->items + index + 1, list->items + index,
(list->nr - index)
--
2.9.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v4 2/2] p0005-status: time status on very large repo
2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
@ 2017-04-11 20:08 ` git
2017-04-11 21:35 ` Jonathan Nieder
2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
2 siblings, 1 reply; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Jeff Hostetler
From: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
t/perf/p0005-status.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
create mode 100755 t/perf/p0005-status.sh
diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh
new file mode 100755
index 0000000..0469aee
--- /dev/null
+++ b/t/perf/p0005-status.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+##
+## This test measures the performance of various read-tree
+## and status operations. It is primarily interested in
+## the algorithmic costs of index operations and recursive
+## tree traversal -- and NOT disk I/O on thousands of files.
+
+test_description="Tests performance of read-tree"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+## If the test repo was generated by ./repos/many-files.sh
+## then we know something about the data shape and branches,
+## so we can isolate testing to the ballast-related commits
+## and setup sparse-checkout so we don't have to populate
+## the ballast files and directories.
+##
+## Otherwise, we make some general assumptions about the
+## repo and consider the entire history of the current
+## branch to be the ballast.
+
+git branch | grep p0006-ballast >/dev/null 2>&1
+synthetic=$?
+if test "$synthetic" = 0
+then
+ echo Assuming synthetic repo from many-files.sh
+ git branch br_base master
+ git branch br_ballast p0006-ballast
+ echo '/*' >.git/info/sparse-checkout
+ echo '!ballast/*' >>.git/info/sparse-checkout
+ git config --local core.sparsecheckout 1
+else
+ echo Assuming non-synthetic repo...
+ git branch br_base $(git rev-list HEAD | tail -n 1)
+ git branch br_ballast HEAD
+fi
+
+test_expect_success 'setup' '
+ git checkout -q br_ballast
+'
+
+nr_files=$(git ls-files | wc -l)
+
+test_perf "read-tree status br_ballast ($nr_files)" '
+ git read-tree HEAD &&
+ git status
+'
+
+test_done
--
2.9.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing
2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
@ 2017-04-11 20:11 ` Jeff King
2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-04-11 20:11 UTC (permalink / raw)
To: git; +Cc: git, gitster, Jeff Hostetler
On Tue, Apr 11, 2017 at 08:08:00PM +0000, git@jeffhostetler.com wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Version 4 greatly simplifies the p0005 perf test. It now uses an existing
> repo -- either real-world or artificial from t/perf/repos/many-files.sh.
>
> Jeff Hostetler (2):
> string-list: use ALLOC_GROW macro when reallocing string_list
> p0005-status: time status on very large repo
Looks fine, though we may want to re-order when applying to introduce
p0005 before we show its output in the commit message of the other.
commit.
-Peff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
@ 2017-04-11 21:29 ` Jonathan Nieder
0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2017-04-11 21:29 UTC (permalink / raw)
To: git; +Cc: git, gitster, peff, Jeff Hostetler
git@jeffhostetler.com wrote:
> During status on a very large repo and there are many changes,
> a significant percentage of the total run time is spent
> reallocing the wt_status.changes array.
>
> This change decreases the time in wt_status_collect_changes_worktree()
> from 125 seconds to 45 seconds on my very large repository.
>
> This produced a modest gain on my 1M file artificial repo, but
> broke even on linux.git.
>
> Test HEAD^^ HEAD
> ---------------------------------------------------------------------------------------
> 0005.2: read-tree status br_ballast (1000001) 8.29(5.62+2.62) 8.22(5.57+2.63) -0.8%
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
> string-list.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
Nice. After rebasing (or just squashing together) with the p0005
patch as peff suggested,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 2/2] p0005-status: time status on very large repo
2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
@ 2017-04-11 21:35 ` Jonathan Nieder
0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2017-04-11 21:35 UTC (permalink / raw)
To: git; +Cc: git, gitster, peff, Jeff Hostetler
Hi,
Jeff Hostetler <jeffhost@microsoft.com> wrote:
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Usually the commit message is a place to put some of the motivation
behind the patch or why I should want to apply it. In this example,
everything is answered by the previous patch, which suggests that it
would be easier to understand if they were squashed into a single
patch.
[...]
> +++ b/t/perf/p0005-status.sh
> @@ -0,0 +1,51 @@
> +#!/bin/sh
> +##
> +## This test measures the performance of various read-tree
> +## and status operations. It is primarily interested in
> +## the algorithmic costs of index operations and recursive
> +## tree traversal -- and NOT disk I/O on thousands of files.
Nit: git usually uses a single # to start comments in shell scripts
(and likewise below).
[...]
> +## If the test repo was generated by ./repos/many-files.sh
> +## then we know something about the data shape and branches,
> +## so we can isolate testing to the ballast-related commits
> +## and setup sparse-checkout so we don't have to populate
> +## the ballast files and directories.
> +##
> +## Otherwise, we make some general assumptions about the
> +## repo and consider the entire history of the current
> +## branch to be the ballast.
> +
> +git branch | grep p0006-ballast >/dev/null 2>&1
> +synthetic=$?
> +if test "$synthetic" = 0
nit: no need to run a command and read $? when using the
command directly from if would do:
if git branch | grep p0006-ballast
This can be simplified further by using a lower-level command
for the test:
if git rev-parse --verify refs/heads/p0006-ballast^{commit}
> +then
> + echo Assuming synthetic repo from many-files.sh
> + git branch br_base master
> + git branch br_ballast p0006-ballast
> + echo '/*' >.git/info/sparse-checkout
> + echo '!ballast/*' >>.git/info/sparse-checkout
> + git config --local core.sparsecheckout 1
> +else
> + echo Assuming non-synthetic repo...
> + git branch br_base $(git rev-list HEAD | tail -n 1)
> + git branch br_ballast HEAD
> +fi
This kind of setup instructions usually go in a test_expect_success
block so that the perf harness can detect whether they failed.
> +
> +test_expect_success 'setup' '
> + git checkout -q br_ballast
nit: no need to use "-q" --- most output from test_expect_success
is suppressed unless the script is run with "-v", at which point
it becomes useful for debugging.
> +'
> +
> +nr_files=$(git ls-files | wc -l)
This also can go in the test_expect_success block.
> +
> +test_perf "read-tree status br_ballast ($nr_files)" '
> + git read-tree HEAD &&
> + git status
> +'
Looks nice and simple. Thanks for writing it.
Hope that helps,
Jonathan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-04-11 21:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
2017-04-11 21:29 ` Jonathan Nieder
2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
2017-04-11 21:35 ` Jonathan Nieder
2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).