git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing
@ 2017-04-11 20:08 git
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Version 4 greatly simplifies the p0005 perf test. It now uses an existing
repo -- either real-world or artificial from t/perf/repos/many-files.sh.

Jeff Hostetler (2):
  string-list: use ALLOC_GROW macro when reallocing string_list
  p0005-status: time status on very large repo

 string-list.c          |  5 +----
 t/perf/p0005-status.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 4 deletions(-)
 create mode 100755 t/perf/p0005-status.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list
  2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
@ 2017-04-11 20:08 ` git
  2017-04-11 21:29   ` Jonathan Nieder
  2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
  2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
  2 siblings, 1 reply; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Use ALLOC_GROW() macro when reallocing a string_list array
rather than simply increasing it by 32.  This is a performance
optimization.

During status on a very large repo and there are many changes,
a significant percentage of the total run time is spent
reallocing the wt_status.changes array.

This change decreases the time in wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.

This produced a modest gain on my 1M file artificial repo, but
broke even on linux.git.

Test                                            HEAD^^            HEAD
---------------------------------------------------------------------------------------
0005.2: read-tree status br_ballast (1000001)   8.29(5.62+2.62)   8.22(5.57+2.63) -0.8%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 string-list.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/string-list.c b/string-list.c
index 45016ad..003ca18 100644
--- a/string-list.c
+++ b/string-list.c
@@ -41,10 +41,7 @@ static int add_entry(int insert_at, struct string_list *list, const char *string
 	if (exact_match)
 		return -1 - index;
 
-	if (list->nr + 1 >= list->alloc) {
-		list->alloc += 32;
-		REALLOC_ARRAY(list->items, list->alloc);
-	}
+	ALLOC_GROW(list->items, list->nr+1, list->alloc);
 	if (index < list->nr)
 		memmove(list->items + index + 1, list->items + index,
 				(list->nr - index)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/2] p0005-status: time status on very large repo
  2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
@ 2017-04-11 20:08 ` git
  2017-04-11 21:35   ` Jonathan Nieder
  2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King
  2 siblings, 1 reply; 6+ messages in thread
From: git @ 2017-04-11 20:08 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/perf/p0005-status.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100755 t/perf/p0005-status.sh

diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh
new file mode 100755
index 0000000..0469aee
--- /dev/null
+++ b/t/perf/p0005-status.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+##
+## This test measures the performance of various read-tree
+## and status operations.  It is primarily interested in
+## the algorithmic costs of index operations and recursive
+## tree traversal -- and NOT disk I/O on thousands of files.
+
+test_description="Tests performance of read-tree"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+## If the test repo was generated by ./repos/many-files.sh
+## then we know something about the data shape and branches,
+## so we can isolate testing to the ballast-related commits
+## and setup sparse-checkout so we don't have to populate
+## the ballast files and directories.
+##
+## Otherwise, we make some general assumptions about the
+## repo and consider the entire history of the current
+## branch to be the ballast.
+
+git branch | grep p0006-ballast >/dev/null 2>&1
+synthetic=$?
+if test "$synthetic" = 0
+then
+    echo Assuming synthetic repo from many-files.sh
+    git branch br_base            master
+    git branch br_ballast         p0006-ballast
+    echo '/*'          >.git/info/sparse-checkout
+    echo '!ballast/*' >>.git/info/sparse-checkout
+    git config --local core.sparsecheckout 1
+else
+    echo Assuming non-synthetic repo...
+    git branch br_base            $(git rev-list HEAD | tail -n 1)
+    git branch br_ballast         HEAD
+fi
+
+test_expect_success 'setup' '
+	git checkout -q br_ballast
+'
+
+nr_files=$(git ls-files | wc -l)
+
+test_perf "read-tree status br_ballast ($nr_files)" '
+	git read-tree HEAD &&
+	git status
+'
+
+test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing
  2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
  2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
@ 2017-04-11 20:11 ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-04-11 20:11 UTC (permalink / raw)
  To: git; +Cc: git, gitster, Jeff Hostetler

On Tue, Apr 11, 2017 at 08:08:00PM +0000, git@jeffhostetler.com wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Version 4 greatly simplifies the p0005 perf test. It now uses an existing
> repo -- either real-world or artificial from t/perf/repos/many-files.sh.
> 
> Jeff Hostetler (2):
>   string-list: use ALLOC_GROW macro when reallocing string_list
>   p0005-status: time status on very large repo

Looks fine, though we may want to re-order when applying to introduce
p0005 before we show its output in the commit message of the other.
commit.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list
  2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
@ 2017-04-11 21:29   ` Jonathan Nieder
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2017-04-11 21:29 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

git@jeffhostetler.com wrote:

> During status on a very large repo and there are many changes,
> a significant percentage of the total run time is spent
> reallocing the wt_status.changes array.
>
> This change decreases the time in wt_status_collect_changes_worktree()
> from 125 seconds to 45 seconds on my very large repository.
>
> This produced a modest gain on my 1M file artificial repo, but
> broke even on linux.git.
>
> Test                                            HEAD^^            HEAD
> ---------------------------------------------------------------------------------------
> 0005.2: read-tree status br_ballast (1000001)   8.29(5.62+2.62)   8.22(5.57+2.63) -0.8%
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  string-list.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)

Nice.  After rebasing (or just squashing together) with the p0005
patch as peff suggested,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] p0005-status: time status on very large repo
  2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
@ 2017-04-11 21:35   ` Jonathan Nieder
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2017-04-11 21:35 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

Hi,

Jeff Hostetler <jeffhost@microsoft.com> wrote:

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Usually the commit message is a place to put some of the motivation
behind the patch or why I should want to apply it.  In this example,
everything is answered by the previous patch, which suggests that it
would be easier to understand if they were squashed into a single
patch.

[...]
> +++ b/t/perf/p0005-status.sh
> @@ -0,0 +1,51 @@
> +#!/bin/sh
> +##
> +## This test measures the performance of various read-tree
> +## and status operations.  It is primarily interested in
> +## the algorithmic costs of index operations and recursive
> +## tree traversal -- and NOT disk I/O on thousands of files.

Nit: git usually uses a single # to start comments in shell scripts
(and likewise below).

[...]
> +## If the test repo was generated by ./repos/many-files.sh
> +## then we know something about the data shape and branches,
> +## so we can isolate testing to the ballast-related commits
> +## and setup sparse-checkout so we don't have to populate
> +## the ballast files and directories.
> +##
> +## Otherwise, we make some general assumptions about the
> +## repo and consider the entire history of the current
> +## branch to be the ballast.
> +
> +git branch | grep p0006-ballast >/dev/null 2>&1
> +synthetic=$?
> +if test "$synthetic" = 0

nit: no need to run a command and read $? when using the
command directly from if would do:

	if git branch | grep p0006-ballast

This can be simplified further by using a lower-level command
for the test:

	if git rev-parse --verify refs/heads/p0006-ballast^{commit}

> +then
> +    echo Assuming synthetic repo from many-files.sh
> +    git branch br_base            master
> +    git branch br_ballast         p0006-ballast
> +    echo '/*'          >.git/info/sparse-checkout
> +    echo '!ballast/*' >>.git/info/sparse-checkout
> +    git config --local core.sparsecheckout 1
> +else
> +    echo Assuming non-synthetic repo...
> +    git branch br_base            $(git rev-list HEAD | tail -n 1)
> +    git branch br_ballast         HEAD
> +fi

This kind of setup instructions usually go in a test_expect_success
block so that the perf harness can detect whether they failed.

> +
> +test_expect_success 'setup' '
> +	git checkout -q br_ballast

nit: no need to use "-q" --- most output from test_expect_success
is suppressed unless the script is run with "-v", at which point
it becomes useful for debugging.

> +'
> +
> +nr_files=$(git ls-files | wc -l)

This also can go in the test_expect_success block.

> +
> +test_perf "read-tree status br_ballast ($nr_files)" '
> +	git read-tree HEAD &&
> +	git status
> +'

Looks nice and simple.  Thanks for writing it.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-11 21:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 20:08 [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing git
2017-04-11 20:08 ` [PATCH v4 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
2017-04-11 21:29   ` Jonathan Nieder
2017-04-11 20:08 ` [PATCH v4 2/2] p0005-status: time status on very large repo git
2017-04-11 21:35   ` Jonathan Nieder
2017-04-11 20:11 ` [PATCH v4 0/2] string-list: use ALLOC_GROW macro when reallocing Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).