git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH 0/2] commit-graph: add progress output
@ 2018-09-04 20:27 Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
                   ` (5 more replies)
  0 siblings, 6 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-04 20:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Ævar Arnfjörð Bjarmason

This series adds progress output to the commit-graph command, so that
when it's called by "git gc" or "git fsck" we can see what's going on
with it.

Ævar Arnfjörð Bjarmason (2):
  commit-graph write: add progress output
  commit-graph verify: add progress output

 commit-graph.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
@ 2018-09-04 20:27 ` " Ævar Arnfjörð Bjarmason
  2018-09-04 21:16   ` Eric Sunshine
                     ` (3 more replies)
  2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-04 20:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Ævar Arnfjörð Bjarmason

Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository, which is a test case for
larger monorepos.

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Total 2821 (delta 1670), reused 2821 (delta 1670)
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either detect by detecting
that we're only processing one pack, or by first looping over the
packs to discover how many commits they have. I don't see the point in
doing that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

1. https://github.com/avar/2015-04-03-1M-git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..74889dc90a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@ struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -591,8 +597,13 @@ static void close_reachable(struct packed_oid_list *oids)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	progress = start_delayed_progress(
+		_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +615,7 @@ static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +623,25 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
 static void compute_generation_numbers(struct packed_commit_list* commits)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	progress = start_progress(
+		_("Computing commit graph generation numbers"), commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +673,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	display_progress(progress, i);
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -692,9 +712,12 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +744,9 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		oids.progress = start_delayed_progress(
+			_("Finding commits for commit graph"), 0);
+		oids.progress_done = 0;
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +759,19 @@ void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		progress = start_delayed_progress(
+			_("Finding commits for commit graph"), commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,10 +784,16 @@ void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		display_progress(progress, i);
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		oids.progress = start_delayed_progress(
+			_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
 	close_reachable(&oids);
 
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 2/2] commit-graph verify: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-04 20:27 ` " Ævar Arnfjörð Bjarmason
  2018-09-04 22:10   ` Junio C Hamano
  2018-09-05 12:07 ` [PATCH 0/2] commit-graph: " Derrick Stolee
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-04 20:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Ævar Arnfjörð Bjarmason

For the reasons explained in the "commit-graph write: add progress
output" commit leading up to this one, emit progress on "commit-graph
verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
"git fsck" has called this command if core.commitGraph=true, but
there's been no progress output to indicate that anything was
different. Now there is (on my tiny dotfiles.git repository):

    $ git -c core.commitGraph=true -C ~/ fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2821/2821), done.
    dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
    Verifying commits in commit graph: 100% (867/867), done.

And on a larger repository, such as the 2015-04-03-1M-git.git test
repository:

    $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
    Verifying commits in commit graph: 100% (1000447/1000447), done.
    real    0m7.813s
    [...]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 74889dc90a..1a02fe019a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -914,6 +914,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
+	struct progress *progress = NULL;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -981,11 +982,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
+	progress = start_progress("Verifying commits in commit graph",
+				  g->num_commits);
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
 
+		display_progress(progress, i);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
@@ -1062,6 +1066,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     graph_commit->date,
 				     odb_commit->date);
 	}
+	display_progress(progress, i);
+	stop_progress(&progress);
 
 	return verify_commit_graph_error;
 }
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-04 21:16   ` Eric Sunshine
  2018-09-04 22:07   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 88+ messages in thread
From: Eric Sunshine @ 2018-09-04 21:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Derrick Stolee

On Tue, Sep 4, 2018 at 4:27 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting

s/detect//

> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in
> doing that work. So instead we get (on 2015-04-03-1M-git.git):
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-04 21:16   ` Eric Sunshine
@ 2018-09-04 22:07   ` Junio C Hamano
  2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:06   ` Derrick Stolee
  2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 88+ messages in thread
From: Junio C Hamano @ 2018-09-04 22:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Before this change the "commit-graph write" command didn't report any

Please describe the pre-patch state in present tense without "Before
this change".

> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.
>
> Furthermore, since the gc.writeCommitGraph setting was added in
> d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
> there was no indication at all from a "git gc" run that anything was
> different. This why one of the progress bars being added here uses

"This is why", I guess.

> start_progress() instead of start_delayed_progress(), so that it's
> guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
> repository:
>
>     $ git -c gc.writeCommitGraph=true gc
>     Enumerating objects: 2821, done.
>     [...]
>     Total 2821 (delta 1670), reused 2821 (delta 1670)
>     Computing commit graph generation numbers: 100% (867/867), done.
>
> On larger repositories, such as linux.git the delayed progress bar(s)

"such as linux.git, the delayed ..."

> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting
> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in

I do not know if there is no point, but if we were to do it, I think
slurping the list of packs and computing the number of objects is
not all that bad.

>  static void compute_generation_numbers(struct packed_commit_list* commits)
>  {
>  	int i;
>  	struct commit_list *list = NULL;
> +	struct progress *progress = NULL;
>  
> +	progress = start_progress(
> +		_("Computing commit graph generation numbers"), commits->nr);
>  	for (i = 0; i < commits->nr; i++) {
> +		display_progress(progress, i);
>  		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>  		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  			continue;

I am wondering if the progress call should be moved after this
conditional continue; would we want to count the entry whose
generation is already known here?  Of course, as we give commits->nr
as the 100% ceiling, we cannot avoid doing so, but it somehow smells
wrong.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/2] commit-graph verify: add progress output
  2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
@ 2018-09-04 22:10   ` Junio C Hamano
  0 siblings, 0 replies; 88+ messages in thread
From: Junio C Hamano @ 2018-09-04 22:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> For the reasons explained in the "commit-graph write: add progress
> output" commit leading up to this one, emit progress on "commit-graph
> verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
> "git fsck" has called this command if core.commitGraph=true, but
> there's been no progress output to indicate that anything was
> different. Now there is (on my tiny dotfiles.git repository):
>
>     $ git -c core.commitGraph=true -C ~/ fsck
>     Checking object directories: 100% (256/256), done.
>     Checking objects: 100% (2821/2821), done.
>     dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
>     Verifying commits in commit graph: 100% (867/867), done.
>
> And on a larger repository, such as the 2015-04-03-1M-git.git test
> repository:
>
>     $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
>     Verifying commits in commit graph: 100% (1000447/1000447), done.
>     real    0m7.813s
>     [...]
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  commit-graph.c | 6 ++++++
>  1 file changed, 6 insertions(+)

Yup.  The verification side knows the total number of things, so it
is much easier to give the percentage progress with a very simple
addition like this, which is very nice.



>
> diff --git a/commit-graph.c b/commit-graph.c
> index 74889dc90a..1a02fe019a 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -914,6 +914,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  	int generation_zero = 0;
>  	struct hashfile *f;
>  	int devnull;
> +	struct progress *progress = NULL;
>  
>  	if (!g) {
>  		graph_report("no commit-graph file loaded");
> @@ -981,11 +982,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>  		return verify_commit_graph_error;
>  
> +	progress = start_progress("Verifying commits in commit graph",
> +				  g->num_commits);
>  	for (i = 0; i < g->num_commits; i++) {
>  		struct commit *graph_commit, *odb_commit;
>  		struct commit_list *graph_parents, *odb_parents;
>  		uint32_t max_generation = 0;
>  
> +		display_progress(progress, i);
>  		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
>  
>  		graph_commit = lookup_commit(r, &cur_oid);
> @@ -1062,6 +1066,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  				     graph_commit->date,
>  				     odb_commit->date);
>  	}
> +	display_progress(progress, i);
> +	stop_progress(&progress);
>  
>  	return verify_commit_graph_error;
>  }

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 22:07   ` Junio C Hamano
@ 2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
                         ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-09-05 11:58 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason; +Cc: git

On 9/4/2018 6:07 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> With --stdin-packs we don't show any estimation of how much is left to
>> do. This is because we might be processing more than one pack. We
>> could be less lazy here and show progress, either detect by detecting
>> that we're only processing one pack, or by first looping over the
>> packs to discover how many commits they have. I don't see the point in
> I do not know if there is no point, but if we were to do it, I think
> slurping the list of packs and computing the number of objects is
> not all that bad.

If you want to do that, I have nothing against it. However, I don't 
expect users to use that option directly. That option is used by VFS for 
Git to compute the commit-graph in the background after receiving a pack 
of commits and trees, but not by 'git gc' which I expect is how most 
users will compute commit-graphs.

>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>   {
>>   	int i;
>>   	struct commit_list *list = NULL;
>> +	struct progress *progress = NULL;
>>   
>> +	progress = start_progress(
>> +		_("Computing commit graph generation numbers"), commits->nr);
>>   	for (i = 0; i < commits->nr; i++) {
>> +		display_progress(progress, i);
>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>   			continue;
> I am wondering if the progress call should be moved after this
> conditional continue; would we want to count the entry whose
> generation is already known here?  Of course, as we give commits->nr
> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
> wrong.

If we wanted to be completely right, we would count the commits in the 
list that do not have a generation number and report that as the 100% 
ceiling.

Something like the diff below would work. I tested it in Linux by first 
deleting my commit-graph and running the following:

stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git commit-graph 
write --stdin-commits
Annotating commits in commit graph: 1180333, done.
Computing commit graph generation numbers: 100% (590166/590166), done.
stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
Annotating commits in commit graph: 1564087, done.
Computing commit graph generation numbers: 100% (191590/191590), done.

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Wed, 5 Sep 2018 11:55:42 +0000
Subject: [PATCH] fixup! commit-graph write: add progress output

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
  commit-graph.c | 15 +++++++++++----
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 1a02fe019a..b933bc9f00 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -634,14 +634,20 @@ static void close_reachable(struct packed_oid_list 
*oids)

  static void compute_generation_numbers(struct packed_commit_list* commits)
  {
-       int i;
+       int i, count_uncomputed = 0;
         struct commit_list *list = NULL;
         struct progress *progress = NULL;

+       for (i = 0; i < commits->nr; i++)
+               if (commits->list[i]->generation == 
GENERATION_NUMBER_INFINITY ||
+                   commits->list[i]->generation == GENERATION_NUMBER_ZERO)
+                       count_uncomputed++;
+
         progress = start_progress(
-               _("Computing commit graph generation numbers"), 
commits->nr);
+               _("Computing commit graph generation numbers"), 
count_uncomputed);
+       count_uncomputed = 0;
+
         for (i = 0; i < commits->nr; i++) {
-               display_progress(progress, i);
                 if (commits->list[i]->generation != 
GENERATION_NUMBER_INFINITY &&
                     commits->list[i]->generation != GENERATION_NUMBER_ZERO)
                         continue;
@@ -670,10 +676,11 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)

                                 if (current->generation > 
GENERATION_NUMBER_MAX)
                                         current->generation = 
GENERATION_NUMBER_MAX;
+
+                               display_progress(progress, 
++count_uncomputed);
                         }
                 }
         }
-       display_progress(progress, i);
         stop_progress(&progress);
  }

--
2.19.0.rc2


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-04 21:16   ` Eric Sunshine
  2018-09-04 22:07   ` Junio C Hamano
@ 2018-09-05 12:06   ` Derrick Stolee
  2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-09-05 12:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

On 9/4/2018 4:27 PM, Ævar Arnfjörð Bjarmason wrote:
> @@ -591,8 +597,13 @@ static void close_reachable(struct packed_oid_list *oids)
>   {
>   	int i;
>   	struct commit *commit;
> +	struct progress *progress = NULL;
> +	int j = 0;

The change below over-counts the number of commits we are processing (by 
at least double, possibly triple).


> +	progress = start_delayed_progress(
> +		_("Annotating commits in commit graph"), 0);
>   	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>   		commit = lookup_commit(the_repository, &oids->list[i]);
>   		if (commit)
>   			commit->object.flags |= UNINTERESTING;
This count is the number of oids given to the method. For 'git 
commit-graph write --reachable', this will be the number of refs.
> @@ -604,6 +615,7 @@ static void close_reachable(struct packed_oid_list *oids)
>   	 * closure.
>   	 */
>   	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>   		commit = lookup_commit(the_repository, &oids->list[i]);
>   
>   		if (commit && !parse_commit(commit))

This is the important count, since we will be parsing commits and adding 
their parents to the list. The bulk of the work happens here.

> @@ -611,19 +623,25 @@ static void close_reachable(struct packed_oid_list *oids)
>   	}
>   
>   	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>   		commit = lookup_commit(the_repository, &oids->list[i]);
This iterates through the commits a second time and removes the 
UNINTERESTING flag.
>   
>   		if (commit)
>   			commit->object.flags &= ~UNINTERESTING;
>   	}
> +	stop_progress(&progress);
>   }

I think it is good to have the progress start before the first loop and 
end after the third loop, but the middle loop has the important count.

I tried deleting the first and third display_progress() methods and 
re-ran the process on the Linux repo and did not notice a delay at the 
0% and 100% progress spots. The count matches the number of commits.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/2] commit-graph: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
@ 2018-09-05 12:07 ` " Derrick Stolee
  2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-09-05 12:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

On 9/4/2018 4:27 PM, Ævar Arnfjörð Bjarmason wrote:
> This series adds progress output to the commit-graph command, so that
> when it's called by "git gc" or "git fsck" we can see what's going on
> with it.
>
> Ævar Arnfjörð Bjarmason (2):
>    commit-graph write: add progress output
>    commit-graph verify: add progress output
>
>   commit-graph.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 43 insertions(+), 1 deletion(-)

Thanks for writing this, Ævar. I appreciate that you took the time to 
fill in an important UX gap.

I had a couple of comments, but generally this is a good change.

-Stolee


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 11:58     ` Derrick Stolee
@ 2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
  2018-09-05 21:46       ` Junio C Hamano
  2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-05 12:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git


On Wed, Sep 05 2018, Derrick Stolee wrote:

> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> With --stdin-packs we don't show any estimation of how much is left to
>>> do. This is because we might be processing more than one pack. We
>>> could be less lazy here and show progress, either detect by detecting
>>> that we're only processing one pack, or by first looping over the
>>> packs to discover how many commits they have. I don't see the point in
>> I do not know if there is no point, but if we were to do it, I think
>> slurping the list of packs and computing the number of objects is
>> not all that bad.
>
> If you want to do that, I have nothing against it. However, I don't
> expect users to use that option directly. That option is used by VFS
> for Git to compute the commit-graph in the background after receiving
> a pack of commits and trees, but not by 'git gc' which I expect is how
> most users will compute commit-graphs.

Yeah, I suspected only one guy at Microsoft would potentially benefit
from this, but added it just so we'd have progress regardless of entry
point :)

>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>   {
>>>   	int i;
>>>   	struct commit_list *list = NULL;
>>> +	struct progress *progress = NULL;
>>>   +	progress = start_progress(
>>> +		_("Computing commit graph generation numbers"), commits->nr);
>>>   	for (i = 0; i < commits->nr; i++) {
>>> +		display_progress(progress, i);
>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>   			continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.
>
> Something like the diff below would work. I tested it in Linux by
> first deleting my commit-graph and running the following:
>
> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
> commit-graph write --stdin-commits
> Annotating commits in commit graph: 1180333, done.
> Computing commit graph generation numbers: 100% (590166/590166), done.
> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
> Annotating commits in commit graph: 1564087, done.
> Computing commit graph generation numbers: 100% (191590/191590), done.
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Wed, 5 Sep 2018 11:55:42 +0000
> Subject: [PATCH] fixup! commit-graph write: add progress output
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-graph.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a02fe019a..b933bc9f00 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -634,14 +634,20 @@ static void close_reachable(struct
> packed_oid_list *oids)
>
> static void compute_generation_numbers(struct packed_commit_list* commits)
> {
> - int i;
> + int i, count_uncomputed = 0;
>  struct commit_list *list = NULL;
>  struct progress *progress = NULL;
>
> + for (i = 0; i < commits->nr; i++)
> + if (commits->list[i]->generation ==
> GENERATION_NUMBER_INFINITY ||
> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
> + count_uncomputed++;
> +
>  progress = start_progress(
> - _("Computing commit graph generation numbers"),
> commits->nr);
> + _("Computing commit graph generation numbers"),
> count_uncomputed);
> + count_uncomputed = 0;
> +
>  for (i = 0; i < commits->nr; i++) {
> - display_progress(progress, i);
>  if (commits->list[i]->generation !=
> GENERATION_NUMBER_INFINITY &&
>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  continue;
> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
> packed_commit_list* commits)
>
>  if (current->generation >
> GENERATION_NUMBER_MAX)
>  current->generation =
> GENERATION_NUMBER_MAX;
> +
> + display_progress(progress,
> ++count_uncomputed);
>  }
>  }
>  }
> - display_progress(progress, i);
>  stop_progress(&progress);
> }

Thanks! That looks good, and you obviously know this code a lot
better. I'll squash this into v2 pending further feedback I'll need to
address.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
@ 2018-09-05 21:46       ` Junio C Hamano
  2018-09-05 22:12         ` Derrick Stolee
  2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 88+ messages in thread
From: Junio C Hamano @ 2018-09-05 21:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Ævar Arnfjörð Bjarmason, git

Derrick Stolee <stolee@gmail.com> writes:

>>>   	for (i = 0; i < commits->nr; i++) {
>>> +		display_progress(progress, i);
>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>   			continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.

Yeah, but I realize that the definition of "right" really depends on
what we consider a task being accomplished is in this loop.  If we
define the task to "we have some number of commits that lack
generation numbers and our task is to assign numbers to them.", then
yes counting the ones without generation number and culling the ones
that already have generation number is outside the work and we need
another loop to count them.  But the position the posted patch takes
is also a valid one: we have some commits and we are making sure
each and every one of them has assigned a generation number.

So I do not think it is necessary to introduce another loop just for
counting.

Thanks.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 21:46       ` Junio C Hamano
@ 2018-09-05 22:12         ` Derrick Stolee
  0 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-09-05 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git

On 9/5/2018 5:46 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>>>>    	for (i = 0; i < commits->nr; i++) {
>>>> +		display_progress(progress, i);
>>>>    		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>>    		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>>    			continue;
>>> I am wondering if the progress call should be moved after this
>>> conditional continue; would we want to count the entry whose
>>> generation is already known here?  Of course, as we give commits->nr
>>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>>> wrong.
>> If we wanted to be completely right, we would count the commits in the
>> list that do not have a generation number and report that as the 100%
>> ceiling.
> Yeah, but I realize that the definition of "right" really depends on
> what we consider a task being accomplished is in this loop.  If we
> define the task to "we have some number of commits that lack
> generation numbers and our task is to assign numbers to them.", then
> yes counting the ones without generation number and culling the ones
> that already have generation number is outside the work and we need
> another loop to count them.  But the position the posted patch takes
> is also a valid one: we have some commits and we are making sure
> each and every one of them has assigned a generation number.
>
> So I do not think it is necessary to introduce another loop just for
> counting.
>
> Thanks.
Makes sense to me. Thanks!

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
                     ` (2 preceding siblings ...)
  2018-09-05 12:06   ` Derrick Stolee
@ 2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
  2018-09-07 13:12     ` Derrick Stolee
  3 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 12:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee


On Tue, Sep 04 2018, Ævar Arnfjörð Bjarmason wrote:

> Before this change the "commit-graph write" command didn't report any
> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.

There's a fun issue with this code that I'll fix, but thought was
informative to send a mail about.

Because the graph verification happens in the main "git gc" process, as
opposed to everything else via external commands, so all this progress
output gets written to .git/gc.log.

Then next time we do a "gc --auto" we vomit out a couple of KB of
progress bar output at the user, since spot that the gc.log isn't empty.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
@ 2018-09-07 13:12     ` Derrick Stolee
  0 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-09-07 13:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

On 9/7/2018 8:40 AM, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Sep 04 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> Before this change the "commit-graph write" command didn't report any
>> progress. On my machine this command takes more than 10 seconds to
>> write the graph for linux.git, and around 1m30s on the
>> 2015-04-03-1M-git.git[1] test repository, which is a test case for
>> larger monorepos.
> There's a fun issue with this code that I'll fix, but thought was
> informative to send a mail about.
>
> Because the graph verification happens in the main "git gc" process, as
> opposed to everything else via external commands, so all this progress
> output gets written to .git/gc.log.
>
> Then next time we do a "gc --auto" we vomit out a couple of KB of
> progress bar output at the user, since spot that the gc.log isn't empty.
Good catch! (I do want to clarify that the graph _writing_ happens 
during 'git gc' since 'git commit-graph verify' is a different thing.)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
  2018-09-05 21:46       ` Junio C Hamano
@ 2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
  2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 15:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git


On Wed, Sep 05 2018, Derrick Stolee wrote:

> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> With --stdin-packs we don't show any estimation of how much is left to
>>> do. This is because we might be processing more than one pack. We
>>> could be less lazy here and show progress, either detect by detecting
>>> that we're only processing one pack, or by first looping over the
>>> packs to discover how many commits they have. I don't see the point in
>> I do not know if there is no point, but if we were to do it, I think
>> slurping the list of packs and computing the number of objects is
>> not all that bad.
>
> If you want to do that, I have nothing against it. However, I don't
> expect users to use that option directly. That option is used by VFS
> for Git to compute the commit-graph in the background after receiving
> a pack of commits and trees, but not by 'git gc' which I expect is how
> most users will compute commit-graphs.
>
>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>   {
>>>   	int i;
>>>   	struct commit_list *list = NULL;
>>> +	struct progress *progress = NULL;
>>>   +	progress = start_progress(
>>> +		_("Computing commit graph generation numbers"), commits->nr);
>>>   	for (i = 0; i < commits->nr; i++) {
>>> +		display_progress(progress, i);
>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>   			continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.
>
> Something like the diff below would work. I tested it in Linux by
> first deleting my commit-graph and running the following:
>
> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
> commit-graph write --stdin-commits
> Annotating commits in commit graph: 1180333, done.
> Computing commit graph generation numbers: 100% (590166/590166), done.
> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
> Annotating commits in commit graph: 1564087, done.
> Computing commit graph generation numbers: 100% (191590/191590), done.
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Wed, 5 Sep 2018 11:55:42 +0000
> Subject: [PATCH] fixup! commit-graph write: add progress output
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-graph.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a02fe019a..b933bc9f00 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -634,14 +634,20 @@ static void close_reachable(struct
> packed_oid_list *oids)
>
> static void compute_generation_numbers(struct packed_commit_list* commits)
> {
> - int i;
> + int i, count_uncomputed = 0;
>  struct commit_list *list = NULL;
>  struct progress *progress = NULL;
>
> + for (i = 0; i < commits->nr; i++)
> + if (commits->list[i]->generation ==
> GENERATION_NUMBER_INFINITY ||
> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
> + count_uncomputed++;
> +
>  progress = start_progress(
> - _("Computing commit graph generation numbers"),
> commits->nr);
> + _("Computing commit graph generation numbers"),
> count_uncomputed);
> + count_uncomputed = 0;
> +
>  for (i = 0; i < commits->nr; i++) {
> - display_progress(progress, i);
>  if (commits->list[i]->generation !=
> GENERATION_NUMBER_INFINITY &&
>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  continue;
> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
> packed_commit_list* commits)
>
>  if (current->generation >
> GENERATION_NUMBER_MAX)
>  current->generation =
> GENERATION_NUMBER_MAX;
> +
> + display_progress(progress,
> ++count_uncomputed);
>  }
>  }
>  }
> - display_progress(progress, i);
>  stop_progress(&progress);
> }

One of the things I was trying to do with this series was to make sure
that whenever we run "git gc" there's always some indication that if you
set gc.writeCommitGraph=true that it's actualy doing work.

This modifies that, which I think is actually fine, just something I
wanted to note. I.e. if you run "git commit-graph write" twice in a row,
the second time will have no output.

Unless that is, your repo is big enough that some of the delayed timers
kick in. So e.g. on git.git we get no output the second time around, but
do get output the first time around, and on linux.git we always get
output.

But in the common case people aren't running this in a loop, and it's
useful to see how many new things are being added to the graph, so I
think this is better. Just wanted to note the behavior difference (and
will change the commit message).

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
@ 2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
  2018-09-07 17:15           ` Jeff King
  0 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 15:23 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git


On Fri, Sep 07 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Sep 05 2018, Derrick Stolee wrote:
>
>> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>
>>>> With --stdin-packs we don't show any estimation of how much is left to
>>>> do. This is because we might be processing more than one pack. We
>>>> could be less lazy here and show progress, either detect by detecting
>>>> that we're only processing one pack, or by first looping over the
>>>> packs to discover how many commits they have. I don't see the point in
>>> I do not know if there is no point, but if we were to do it, I think
>>> slurping the list of packs and computing the number of objects is
>>> not all that bad.
>>
>> If you want to do that, I have nothing against it. However, I don't
>> expect users to use that option directly. That option is used by VFS
>> for Git to compute the commit-graph in the background after receiving
>> a pack of commits and trees, but not by 'git gc' which I expect is how
>> most users will compute commit-graphs.
>>
>>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>>   {
>>>>   	int i;
>>>>   	struct commit_list *list = NULL;
>>>> +	struct progress *progress = NULL;
>>>>   +	progress = start_progress(
>>>> +		_("Computing commit graph generation numbers"), commits->nr);
>>>>   	for (i = 0; i < commits->nr; i++) {
>>>> +		display_progress(progress, i);
>>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>>   			continue;
>>> I am wondering if the progress call should be moved after this
>>> conditional continue; would we want to count the entry whose
>>> generation is already known here?  Of course, as we give commits->nr
>>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>>> wrong.
>>
>> If we wanted to be completely right, we would count the commits in the
>> list that do not have a generation number and report that as the 100%
>> ceiling.
>>
>> Something like the diff below would work. I tested it in Linux by
>> first deleting my commit-graph and running the following:
>>
>> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
>> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
>> commit-graph write --stdin-commits
>> Annotating commits in commit graph: 1180333, done.
>> Computing commit graph generation numbers: 100% (590166/590166), done.
>> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
>> Annotating commits in commit graph: 1564087, done.
>> Computing commit graph generation numbers: 100% (191590/191590), done.
>>
>> -->8--
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>> Date: Wed, 5 Sep 2018 11:55:42 +0000
>> Subject: [PATCH] fixup! commit-graph write: add progress output
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>> commit-graph.c | 15 +++++++++++----
>> 1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 1a02fe019a..b933bc9f00 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -634,14 +634,20 @@ static void close_reachable(struct
>> packed_oid_list *oids)
>>
>> static void compute_generation_numbers(struct packed_commit_list* commits)
>> {
>> - int i;
>> + int i, count_uncomputed = 0;
>>  struct commit_list *list = NULL;
>>  struct progress *progress = NULL;
>>
>> + for (i = 0; i < commits->nr; i++)
>> + if (commits->list[i]->generation ==
>> GENERATION_NUMBER_INFINITY ||
>> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
>> + count_uncomputed++;
>> +
>>  progress = start_progress(
>> - _("Computing commit graph generation numbers"),
>> commits->nr);
>> + _("Computing commit graph generation numbers"),
>> count_uncomputed);
>> + count_uncomputed = 0;
>> +
>>  for (i = 0; i < commits->nr; i++) {
>> - display_progress(progress, i);
>>  if (commits->list[i]->generation !=
>> GENERATION_NUMBER_INFINITY &&
>>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>  continue;
>> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
>> packed_commit_list* commits)
>>
>>  if (current->generation >
>> GENERATION_NUMBER_MAX)
>>  current->generation =
>> GENERATION_NUMBER_MAX;
>> +
>> + display_progress(progress,
>> ++count_uncomputed);
>>  }
>>  }
>>  }
>> - display_progress(progress, i);
>>  stop_progress(&progress);
>> }
>
> One of the things I was trying to do with this series was to make sure
> that whenever we run "git gc" there's always some indication that if you
> set gc.writeCommitGraph=true that it's actualy doing work.
>
> This modifies that, which I think is actually fine, just something I
> wanted to note. I.e. if you run "git commit-graph write" twice in a row,
> the second time will have no output.
>
> Unless that is, your repo is big enough that some of the delayed timers
> kick in. So e.g. on git.git we get no output the second time around, but
> do get output the first time around, and on linux.git we always get
> output.
>
> But in the common case people aren't running this in a loop, and it's
> useful to see how many new things are being added to the graph, so I
> think this is better. Just wanted to note the behavior difference (and
> will change the commit message).

Hrm, no. I spoke too soon because I was conflating "commit-graph write"
v.s. "gc". For "gc" we're now with this change just e.g. spending 6
seconds on 2015-04-03-1M-git displaying nothing, because we're looping
through the commits and finding that we have no new work.

So I'm on the fence about this, but leaning towards just taking my
initial approch. I.e. it sucks if you're e.g. testing different "git gc"
options that we're churning in the background doing nothing, just
because we're trying to report how many *new* things we added to the
graph.

After all, the main point IMNSHO is not to show some diagnostic output
of exactly how much work we're doing, that I have 200 new commits with
generation numbers or whatever is just useless trivia, but rather to not
leave the user thinking the command is hanging.

So I think I'll just do what I was doing to begin with and change the
message to "Refreshing commit graph generation numbers" or something to
indicate that it's a find/verify/compute operation, not just a compute
operation.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
@ 2018-09-07 17:15           ` Jeff King
  2018-09-07 17:25             ` Derrick Stolee
  0 siblings, 1 reply; 88+ messages in thread
From: Jeff King @ 2018-09-07 17:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Junio C Hamano, git

On Fri, Sep 07, 2018 at 05:23:31PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Hrm, no. I spoke too soon because I was conflating "commit-graph write"
> v.s. "gc". For "gc" we're now with this change just e.g. spending 6
> seconds on 2015-04-03-1M-git displaying nothing, because we're looping
> through the commits and finding that we have no new work.
> 
> So I'm on the fence about this, but leaning towards just taking my
> initial approch. I.e. it sucks if you're e.g. testing different "git gc"
> options that we're churning in the background doing nothing, just
> because we're trying to report how many *new* things we added to the
> graph.
> 
> After all, the main point IMNSHO is not to show some diagnostic output
> of exactly how much work we're doing, that I have 200 new commits with
> generation numbers or whatever is just useless trivia, but rather to not
> leave the user thinking the command is hanging.

I think there's some precedent for your view of things, too. For
example, "writing objects" counts _all_ of the objects, even though many
of them are just copying bytes straight from disk, and some are actually
generating a delta and/or zlib-deflating content.

So it's not the most precise measurement we could give, but it shows
there's activity, and the "average" movement over many objects tends to
be reasonably smooth.

> So I think I'll just do what I was doing to begin with and change the
> message to "Refreshing commit graph generation numbers" or something to
> indicate that it's a find/verify/compute operation, not just a compute
> operation.

So basically yes, I agree with this. :)

-Peff

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 17:15           ` Jeff King
@ 2018-09-07 17:25             ` Derrick Stolee
  0 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-09-07 17:25 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git

On 9/7/2018 1:15 PM, Jeff King wrote:
> On Fri, Sep 07, 2018 at 05:23:31PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Hrm, no. I spoke too soon because I was conflating "commit-graph write"
>> v.s. "gc". For "gc" we're now with this change just e.g. spending 6
>> seconds on 2015-04-03-1M-git displaying nothing, because we're looping
>> through the commits and finding that we have no new work.
>>
>> So I'm on the fence about this, but leaning towards just taking my
>> initial approch. I.e. it sucks if you're e.g. testing different "git gc"
>> options that we're churning in the background doing nothing, just
>> because we're trying to report how many *new* things we added to the
>> graph.
>>
>> After all, the main point IMNSHO is not to show some diagnostic output
>> of exactly how much work we're doing, that I have 200 new commits with
>> generation numbers or whatever is just useless trivia, but rather to not
>> leave the user thinking the command is hanging.
> I think there's some precedent for your view of things, too. For
> example, "writing objects" counts _all_ of the objects, even though many
> of them are just copying bytes straight from disk, and some are actually
> generating a delta and/or zlib-deflating content.
>
> So it's not the most precise measurement we could give, but it shows
> there's activity, and the "average" movement over many objects tends to
> be reasonably smooth.
>
>> So I think I'll just do what I was doing to begin with and change the
>> message to "Refreshing commit graph generation numbers" or something to
>> indicate that it's a find/verify/compute operation, not just a compute
>> operation.
> So basically yes, I agree with this. :)

Same here. Thanks for the discussion.

-Stolee


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 0/2] commit-graph: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2018-09-05 12:07 ` [PATCH 0/2] commit-graph: " Derrick Stolee
@ 2018-09-07 18:29 ` " Ævar Arnfjörð Bjarmason
  2018-09-11 20:26   ` Junio C Hamano
  2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 18:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Based on feedback on v1, and the "this is yelling at my users through
gc.log" bug I found. Range-diff with v1:

1:  e0a09ad641 ! 1:  b2dcfa0f55 commit-graph write: add progress output
    @@ -5,8 +5,8 @@
         Before this change the "commit-graph write" command didn't report any
         progress. On my machine this command takes more than 10 seconds to
         write the graph for linux.git, and around 1m30s on the
    -    2015-04-03-1M-git.git[1] test repository, which is a test case for
    -    larger monorepos.
    +    2015-04-03-1M-git.git[1] test repository (a test case for a large
    +    monorepository).
     
         Furthermore, since the gc.writeCommitGraph setting was added in
         d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
    @@ -19,7 +19,6 @@
             $ git -c gc.writeCommitGraph=true gc
             Enumerating objects: 2821, done.
             [...]
    -        Total 2821 (delta 1670), reused 2821 (delta 1670)
             Computing commit graph generation numbers: 100% (867/867), done.
     
         On larger repositories, such as linux.git the delayed progress bar(s)
    @@ -27,6 +26,7 @@
         previously happening, printing nothing while we write the graph:
     
             $ git -c gc.writeCommitGraph=true gc
    +        [...]
             Annotating commits in commit graph: 1565573, done.
             Computing commit graph generation numbers: 100% (782484/782484), done.
     
    @@ -42,20 +42,90 @@
     
         With --stdin-packs we don't show any estimation of how much is left to
         do. This is because we might be processing more than one pack. We
    -    could be less lazy here and show progress, either detect by detecting
    -    that we're only processing one pack, or by first looping over the
    -    packs to discover how many commits they have. I don't see the point in
    -    doing that work. So instead we get (on 2015-04-03-1M-git.git):
    +    could be less lazy here and show progress, either by detecting that
    +    we're only processing one pack, or by first looping over the packs to
    +    discover how many commits they have. I don't see the point in doing
    +    that work. So instead we get (on 2015-04-03-1M-git.git):
     
             $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
             Finding commits for commit graph: 13064614, done.
             Annotating commits in commit graph: 3001341, done.
             Computing commit graph generation numbers: 100% (1000447/1000447), done.
     
    +    No GC mode uses --stdin-packs. It's what they use at Microsoft to
    +    manually compute the generation numbers for their collection of large
    +    packs which are never coalesced.
    +
    +    The reason we need a "report_progress" variable passed down from "git
    +    gc" is so that we don't report this output when we're running in the
    +    process "git gc --auto" detaches from the terminal.
    +
    +    Since we write the commit graph from the "git gc" process itself (as
    +    opposed to what we do with say the "git repack" phase), we'd end up
    +    writing the output to .git/gc.log and reporting it to the user next
    +    time as part of the "The last gc run reported the following[...]"
    +    error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
    +    print it next time", 2015-09-19).
    +
    +    So we must keep track of whether or not we're running in that
    +    demonized mode, and if so print no progress.
    +
    +    See [2] and subsequent replies for a discussion of an approach not
    +    taken in compute_generation_numbers(). I.e. we're saying "Computing
    +    commit graph generation numbers", even though on an established
    +    history we're mostly skipping over all the work we did in the
    +    past. This is similar to the white lie we tell in the "Writing
    +    objects" phase (not all are objects being written).
    +
    +    Always showing progress is considered more important than
    +    accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
    +    for 6 seconds with no output on the second "git gc" if no changes were
    +    made to any objects in the interim if we'd take the approach in [2].
    +
         1. https://github.com/avar/2015-04-03-1M-git
     
    +    2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
    +       (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
    + --- a/builtin/commit-graph.c
    + +++ b/builtin/commit-graph.c
    +@@
    + 		opts.obj_dir = get_object_directory();
    + 
    + 	if (opts.reachable) {
    +-		write_commit_graph_reachable(opts.obj_dir, opts.append);
    ++		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
    + 		return 0;
    + 	}
    + 
    +@@
    + 	write_commit_graph(opts.obj_dir,
    + 			   pack_indexes,
    + 			   commit_hex,
    +-			   opts.append);
    ++			   opts.append,
    ++			   1);
    + 
    + 	string_list_clear(&lines, 0);
    + 	return 0;
    +
    + diff --git a/builtin/gc.c b/builtin/gc.c
    + --- a/builtin/gc.c
    + +++ b/builtin/gc.c
    +@@
    + 		clean_pack_garbage();
    + 
    + 	if (gc_write_commit_graph)
    +-		write_commit_graph_reachable(get_object_directory(), 0);
    ++		write_commit_graph_reachable(get_object_directory(), 0,
    ++					     !daemonized);
    + 
    + 	if (auto_gc && too_many_loose_objects())
    + 		warning(_("There are too many unreachable loose objects; "
    +
      diff --git a/commit-graph.c b/commit-graph.c
      --- a/commit-graph.c
      +++ b/commit-graph.c
    @@ -87,14 +157,20 @@
      	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
      		die(_("unable to get type of object %s"), oid_to_hex(oid));
     @@
    + 	}
    + }
    + 
    +-static void close_reachable(struct packed_oid_list *oids)
    ++static void close_reachable(struct packed_oid_list *oids, int report_progress)
      {
      	int i;
      	struct commit *commit;
     +	struct progress *progress = NULL;
     +	int j = 0;
      
    -+	progress = start_delayed_progress(
    -+		_("Annotating commits in commit graph"), 0);
    ++	if (report_progress)
    ++		progress = start_delayed_progress(
    ++			_("Annotating commits in commit graph"), 0);
      	for (i = 0; i < oids->nr; i++) {
     +		display_progress(progress, ++j);
      		commit = lookup_commit(the_repository, &oids->list[i]);
    @@ -121,16 +197,20 @@
     +	stop_progress(&progress);
      }
      
    - static void compute_generation_numbers(struct packed_commit_list* commits)
    +-static void compute_generation_numbers(struct packed_commit_list* commits)
    ++static void compute_generation_numbers(struct packed_commit_list* commits, 
    ++				       int report_progress)
      {
      	int i;
      	struct commit_list *list = NULL;
     +	struct progress *progress = NULL;
      
    -+	progress = start_progress(
    -+		_("Computing commit graph generation numbers"), commits->nr);
    ++	if (report_progress)
    ++		progress = start_progress(
    ++			_("Computing commit graph generation numbers"),
    ++			commits->nr);
      	for (i = 0; i < commits->nr; i++) {
    -+		display_progress(progress, i);
    ++		display_progress(progress, i + 1);
      		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
      		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
      			continue;
    @@ -138,11 +218,34 @@
      			}
      		}
      	}
    -+	display_progress(progress, i);
     +	stop_progress(&progress);
      }
      
      static int add_ref_to_list(const char *refname,
    +@@
    + 	return 0;
    + }
    + 
    +-void write_commit_graph_reachable(const char *obj_dir, int append)
    ++void write_commit_graph_reachable(const char *obj_dir, int append,
    ++				  int report_progress)
    + {
    + 	struct string_list list;
    + 
    + 	string_list_init(&list, 1);
    + 	for_each_ref(add_ref_to_list, &list);
    +-	write_commit_graph(obj_dir, NULL, &list, append);
    ++	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
    + }
    + 
    + void write_commit_graph(const char *obj_dir,
    + 			struct string_list *pack_indexes,
    + 			struct string_list *commit_hex,
    +-			int append)
    ++			int append, int report_progress)
    + {
    + 	struct packed_oid_list oids;
    + 	struct packed_commit_list commits;
     @@
      	int num_chunks;
      	int num_extra_edges;
    @@ -160,9 +263,11 @@
      		int dirlen;
      		strbuf_addf(&packname, "%s/pack/", obj_dir);
      		dirlen = packname.len;
    -+		oids.progress = start_delayed_progress(
    -+			_("Finding commits for commit graph"), 0);
    -+		oids.progress_done = 0;
    ++		if (report_progress) {
    ++			oids.progress = start_delayed_progress(
    ++				_("Finding commits for commit graph"), 0);
    ++			oids.progress_done = 0;
    ++		}
      		for (i = 0; i < pack_indexes->nr; i++) {
      			struct packed_git *p;
      			strbuf_setlen(&packname, dirlen);
    @@ -175,14 +280,16 @@
      	}
      
      	if (commit_hex) {
    -+		progress = start_delayed_progress(
    -+			_("Finding commits for commit graph"), commit_hex->nr);
    ++		if (report_progress)
    ++			progress = start_delayed_progress(
    ++				_("Finding commits for commit graph"),
    ++				commit_hex->nr);
      		for (i = 0; i < commit_hex->nr; i++) {
      			const char *end;
      			struct object_id oid;
      			struct commit *result;
      
    -+			display_progress(progress, i);
    ++			display_progress(progress, i + 1);
      			if (commit_hex->items[i].string &&
      			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
      				continue;
    @@ -190,17 +297,48 @@
      				oids.nr++;
      			}
      		}
    -+		display_progress(progress, i);
     +		stop_progress(&progress);
      	}
      
     -	if (!pack_indexes && !commit_hex)
     +	if (!pack_indexes && !commit_hex) {
    -+		oids.progress = start_delayed_progress(
    -+			_("Finding commits for commit graph"), 0);
    ++		if (report_progress)
    ++			oids.progress = start_delayed_progress(
    ++				_("Finding commits for commit graph"), 0);
      		for_each_packed_object(add_packed_commits, &oids, 0);
     +		stop_progress(&oids.progress);
     +	}
      
    - 	close_reachable(&oids);
    +-	close_reachable(&oids);
    ++	close_reachable(&oids, report_progress);
    + 
    + 	QSORT(oids.list, oids.nr, commit_compare);
    + 
    +@@
    + 	if (commits.nr >= GRAPH_PARENT_MISSING)
    + 		die(_("too many commits to write graph"));
    + 
    +-	compute_generation_numbers(&commits);
    ++	compute_generation_numbers(&commits, report_progress);
    + 
    + 	graph_name = get_commit_graph_filename(obj_dir);
    + 	if (safe_create_leading_directories(graph_name))
    +
    + diff --git a/commit-graph.h b/commit-graph.h
    + --- a/commit-graph.h
    + +++ b/commit-graph.h
    +@@
    + 
    + struct commit_graph *load_commit_graph_one(const char *graph_file);
    + 
    +-void write_commit_graph_reachable(const char *obj_dir, int append);
    ++void write_commit_graph_reachable(const char *obj_dir, int append,
    ++				  int report_progress);
    + void write_commit_graph(const char *obj_dir,
    + 			struct string_list *pack_indexes,
    + 			struct string_list *commit_hex,
    +-			int append);
    ++			int append, int report_progress);
    + 
    + int verify_commit_graph(struct repository *r, struct commit_graph *g);
      
2:  a364297d15 ! 2:  775237cffb commit-graph verify: add progress output
    @@ -23,6 +23,10 @@
             real    0m7.813s
             [...]
     
    +    Since the "commit-graph verify" subcommand is never called from "git
    +    gc", we don't have to worry about passing some some "report_progress"
    +    progress variable around for this codepath.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      diff --git a/commit-graph.c b/commit-graph.c
    @@ -47,7 +51,7 @@
      		struct commit_list *graph_parents, *odb_parents;
      		uint32_t max_generation = 0;
      
    -+		display_progress(progress, i);
    ++		display_progress(progress, i + 1);
      		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
      
      		graph_commit = lookup_commit(r, &cur_oid);
    @@ -55,7 +59,6 @@
      				     graph_commit->date,
      				     odb_commit->date);
      	}
    -+	display_progress(progress, i);
     +	stop_progress(&progress);
      
      	return verify_commit_graph_error;

Ævar Arnfjörð Bjarmason (2):
  commit-graph write: add progress output
  commit-graph verify: add progress output

 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 +-
 commit-graph.c         | 65 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 65 insertions(+), 13 deletions(-)

-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2018-09-07 18:29 ` " Ævar Arnfjörð Bjarmason
  2018-09-21 20:01   ` Derrick Stolee
  2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 18:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    [...]
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.

The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.

Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).

So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.

See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).

Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].

1. https://github.com/avar/2015-04-03-1M-git

2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
   (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 ++-
 commit-graph.c         | 60 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..bc0fa9ba52 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -151,7 +151,7 @@ static int graph_write(int argc, const char **argv)
 		opts.obj_dir = get_object_directory();
 
 	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 		return 0;
 	}
 
@@ -171,7 +171,8 @@ static int graph_write(int argc, const char **argv)
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
 			   commit_hex,
-			   opts.append);
+			   opts.append,
+			   1);
 
 	string_list_clear(&lines, 0);
 	return 0;
diff --git a/builtin/gc.c b/builtin/gc.c
index 57069442b0..06ddd3aea5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -646,7 +646,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 
 	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0);
+		write_commit_graph_reachable(get_object_directory(), 0,
+					     !daemonized);
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..2c5d996194 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@ struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -587,12 +593,18 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids)
+static void close_reachable(struct packed_oid_list *oids, int report_progress)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits)
+static void compute_generation_numbers(struct packed_commit_list* commits, 
+				       int report_progress)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	if (report_progress)
+		progress = start_progress(
+			_("Computing commit graph generation numbers"),
+			commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i + 1);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +677,7 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -667,19 +690,20 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append)
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress)
 {
 	struct string_list list;
 
 	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append);
+	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
 }
 
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append)
+			int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -692,9 +716,12 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +748,11 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		if (report_progress) {
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
+			oids.progress_done = 0;
+		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +765,21 @@ void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		if (report_progress)
+			progress = start_delayed_progress(
+				_("Finding commits for commit graph"),
+				commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,12 +792,18 @@ void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		if (report_progress)
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
-	close_reachable(&oids);
+	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
@@ -799,7 +843,7 @@ void write_commit_graph(const char *obj_dir,
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
 
-	compute_generation_numbers(&commits);
+	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name))
diff --git a/commit-graph.h b/commit-graph.h
index eea62f8c0e..f50712a973 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -52,11 +52,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph_reachable(const char *obj_dir, int append);
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append);
+			int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 2/2] commit-graph verify: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-07 18:29 ` " Ævar Arnfjörð Bjarmason
  2018-09-16  6:55   ` Duy Nguyen
  5 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 18:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

For the reasons explained in the "commit-graph write: add progress
output" commit leading up to this one, emit progress on "commit-graph
verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
"git fsck" has called this command if core.commitGraph=true, but
there's been no progress output to indicate that anything was
different. Now there is (on my tiny dotfiles.git repository):

    $ git -c core.commitGraph=true -C ~/ fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2821/2821), done.
    dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
    Verifying commits in commit graph: 100% (867/867), done.

And on a larger repository, such as the 2015-04-03-1M-git.git test
repository:

    $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
    Verifying commits in commit graph: 100% (1000447/1000447), done.
    real    0m7.813s
    [...]

Since the "commit-graph verify" subcommand is never called from "git
gc", we don't have to worry about passing some some "report_progress"
progress variable around for this codepath.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2c5d996194..0bfb8c180e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -922,6 +922,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
+	struct progress *progress = NULL;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -989,11 +990,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
+	progress = start_progress("Verifying commits in commit graph",
+				  g->num_commits);
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
 
+		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
@@ -1070,6 +1074,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     graph_commit->date,
 				     odb_commit->date);
 	}
+	stop_progress(&progress);
 
 	return verify_commit_graph_error;
 }
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 0/2] commit-graph: add progress output
  2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2018-09-11 20:26   ` Junio C Hamano
  0 siblings, 0 replies; 88+ messages in thread
From: Junio C Hamano @ 2018-09-11 20:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Based on feedback on v1, and the "this is yelling at my users through
> gc.log" bug I found.

I notice that between 'master' and 'pu' there already is one new
callsite of the write_commit_graph_reachable() function; because I
suspect that we will discover more places that give us better
trade-off to have a call to the function, I would not be surprised
if this will conflict with more in-flight topics soonish.

Let me try to queue it and see what happens.

Thanks.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/2] commit-graph verify: add progress output
  2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
@ 2018-09-16  6:55   ` Duy Nguyen
  2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
                       ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Duy Nguyen @ 2018-09-16  6:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff King,
	Eric Sunshine

On Fri, Sep 7, 2018 at 8:30 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> @@ -989,11 +990,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>         if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>                 return verify_commit_graph_error;
>
> +       progress = start_progress("Verifying commits in commit graph",

_()

> +                                 g->num_commits);
-- 
Duy

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 0/2] commit-graph: add progress output
  2018-09-16  6:55   ` Duy Nguyen
@ 2018-09-17 15:33     ` " Ævar Arnfjörð Bjarmason
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-17 15:33     ` [PATCH v3 2/2] commit-graph verify: add " Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Micro change since v2: Missing _() in 2/2 pointed out by Duy.

Ævar Arnfjörð Bjarmason (2):
  commit-graph write: add progress output
  commit-graph verify: add progress output

 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 +-
 commit-graph.c         | 65 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 65 insertions(+), 13 deletions(-)

-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 1/2] commit-graph write: add progress output
  2018-09-16  6:55   ` Duy Nguyen
  2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
@ 2018-09-17 15:33     ` " Ævar Arnfjörð Bjarmason
  2018-10-10 20:37       ` SZEDER Gábor
  2018-10-15 16:54       ` SZEDER Gábor
  2018-09-17 15:33     ` [PATCH v3 2/2] commit-graph verify: add " Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    [...]
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.

The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.

Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).

So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.

See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).

Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].

1. https://github.com/avar/2015-04-03-1M-git

2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
   (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 ++-
 commit-graph.c         | 60 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..bc0fa9ba52 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -151,7 +151,7 @@ static int graph_write(int argc, const char **argv)
 		opts.obj_dir = get_object_directory();
 
 	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 		return 0;
 	}
 
@@ -171,7 +171,8 @@ static int graph_write(int argc, const char **argv)
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
 			   commit_hex,
-			   opts.append);
+			   opts.append,
+			   1);
 
 	string_list_clear(&lines, 0);
 	return 0;
diff --git a/builtin/gc.c b/builtin/gc.c
index 57069442b0..06ddd3aea5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -646,7 +646,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 
 	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0);
+		write_commit_graph_reachable(get_object_directory(), 0,
+					     !daemonized);
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..2c5d996194 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@ struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -587,12 +593,18 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids)
+static void close_reachable(struct packed_oid_list *oids, int report_progress)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits)
+static void compute_generation_numbers(struct packed_commit_list* commits, 
+				       int report_progress)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	if (report_progress)
+		progress = start_progress(
+			_("Computing commit graph generation numbers"),
+			commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i + 1);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +677,7 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -667,19 +690,20 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append)
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress)
 {
 	struct string_list list;
 
 	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append);
+	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
 }
 
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append)
+			int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -692,9 +716,12 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +748,11 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		if (report_progress) {
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
+			oids.progress_done = 0;
+		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +765,21 @@ void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		if (report_progress)
+			progress = start_delayed_progress(
+				_("Finding commits for commit graph"),
+				commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,12 +792,18 @@ void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		if (report_progress)
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
-	close_reachable(&oids);
+	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
@@ -799,7 +843,7 @@ void write_commit_graph(const char *obj_dir,
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
 
-	compute_generation_numbers(&commits);
+	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name))
diff --git a/commit-graph.h b/commit-graph.h
index eea62f8c0e..f50712a973 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -52,11 +52,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph_reachable(const char *obj_dir, int append);
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append);
+			int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 2/2] commit-graph verify: add progress output
  2018-09-16  6:55   ` Duy Nguyen
  2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-17 15:33     ` " Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

For the reasons explained in the "commit-graph write: add progress
output" commit leading up to this one, emit progress on "commit-graph
verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
"git fsck" has called this command if core.commitGraph=true, but
there's been no progress output to indicate that anything was
different. Now there is (on my tiny dotfiles.git repository):

    $ git -c core.commitGraph=true -C ~/ fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2821/2821), done.
    dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
    Verifying commits in commit graph: 100% (867/867), done.

And on a larger repository, such as the 2015-04-03-1M-git.git test
repository:

    $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
    Verifying commits in commit graph: 100% (1000447/1000447), done.
    real    0m7.813s
    [...]

Since the "commit-graph verify" subcommand is never called from "git
gc", we don't have to worry about passing some some "report_progress"
progress variable around for this codepath.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2c5d996194..e6e4c03986 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -922,6 +922,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
+	struct progress *progress = NULL;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -989,11 +990,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
+	progress = start_progress(_("Verifying commits in commit graph"),
+				  g->num_commits);
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
 
+		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
@@ -1070,6 +1074,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     graph_commit->date,
 				     odb_commit->date);
 	}
+	stop_progress(&progress);
 
 	return verify_commit_graph_error;
 }
-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-21 20:01   ` Derrick Stolee
  2018-09-21 21:43     ` Junio C Hamano
  0 siblings, 1 reply; 88+ messages in thread
From: Derrick Stolee @ 2018-09-21 20:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On 9/7/2018 2:29 PM, Ævar Arnfjörð Bjarmason wrote:
> -void write_commit_graph_reachable(const char *obj_dir, int append);
> +void write_commit_graph_reachable(const char *obj_dir, int append,
> +				  int report_progress);
>   void write_commit_graph(const char *obj_dir,
>   			struct string_list *pack_indexes,
>   			struct string_list *commit_hex,
> -			int append);
> +			int append, int report_progress);
>   
>   int verify_commit_graph(struct repository *r, struct commit_graph *g);
>   

Junio,

The above prototype change seems to have created a semantic conflict 
with ds/commit-graph-tests (859fdc "commit-graph: define 
GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we 
call write_commit_graph_reachable() but the final parameter was resolved 
to be "1" instead of "0".

This causes t3420-rebase-autostash.sh to fail, as that test watches the 
full output of the rebase command, including commit runs. The following 
patch fixes the problem, but could probably be squashed into a merge or 
other commit.

Thanks,

-Stolee

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 21 Sep 2018 19:57:36 +0000
Subject: [PATCH] commit: quietly write commit-graph in tests

The GIT_TEST_COMMIT_GRAPH environment variable causes git-commit to
write a commit-graph file on every execution. Recently, we added
progress output when writing the commit-graph. This conflicts with
some expected output during some tests, so avoid writing progress
if writing a commit-graph this way.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
  builtin/commit.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 2a49ab4917..764664d832 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1660,7 +1660,7 @@ int cmd_commit(int argc, const char **argv, const 
char *prefix)
                       "not exceeded, and then \"git reset HEAD\" to 
recover."));

         if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
- write_commit_graph_reachable(get_object_directory(), 0, 1);
+ write_commit_graph_reachable(get_object_directory(), 0, 0);

         rerere(0);
         run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
--
2.19.0


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-21 20:01   ` Derrick Stolee
@ 2018-09-21 21:43     ` Junio C Hamano
  2018-09-21 21:57       ` Junio C Hamano
  0 siblings, 1 reply; 88+ messages in thread
From: Junio C Hamano @ 2018-09-21 21:43 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Eric Sunshine, Nguyễn Thái Ngọc Duy

Derrick Stolee <stolee@gmail.com> writes:

> On 9/7/2018 2:29 PM, Ævar Arnfjörð Bjarmason wrote:
>> -void write_commit_graph_reachable(const char *obj_dir, int append);
>> +void write_commit_graph_reachable(const char *obj_dir, int append,
>> +				  int report_progress);
>>   void write_commit_graph(const char *obj_dir,
>>   			struct string_list *pack_indexes,
>>   			struct string_list *commit_hex,
>> -			int append);
>> +			int append, int report_progress);
>>     int verify_commit_graph(struct repository *r, struct
>> commit_graph *g);
>>   
>
> Junio,
>
> The above prototype change seems to have created a semantic conflict
> with ds/commit-graph-tests (859fdc "commit-graph: define
> GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we
> call write_commit_graph_reachable() but the final parameter was
> resolved to be "1" instead of "0".

Hmph.  That's unfortunate.

Perhaps one of the topics should have yielded and waited until the
other one passes through.

As 859fdc0c ("commit-graph: define GIT_TEST_COMMIT_GRAPH",
2018-08-29) already is in 'master', the other "progress" topic
probably should be corrected to match.  The easiest and cleanest
would be to eject the ab/commit-graph-progress topic out of 'next'
and have it rerolled on top of 'master', as we are going to rewind
the tip of 'next' anyway.

While we are at it, I suspect that a saner evolution of the API into
the function would not append more parameters to the call, but would
make the "do we append?" bit into a flag word "unsigned flags" with
two bits, and such a clean-up can be done as a preliminary change.

> This causes t3420-rebase-autostash.sh to fail, as that test watches
> the full output of the rebase command, including commit runs. The
> following patch fixes the problem, but could probably be squashed into
> a merge or other commit.
>
> Thanks,
>
> -Stolee
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 21 Sep 2018 19:57:36 +0000
> Subject: [PATCH] commit: quietly write commit-graph in tests
>
> The GIT_TEST_COMMIT_GRAPH environment variable causes git-commit to
> write a commit-graph file on every execution. Recently, we added
> progress output when writing the commit-graph. This conflicts with
> some expected output during some tests, so avoid writing progress
> if writing a commit-graph this way.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 2a49ab4917..764664d832 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1660,7 +1660,7 @@ int cmd_commit(int argc, const char **argv,
> const char *prefix)
>                       "not exceeded, and then \"git reset HEAD\" to
> recover."));
>
>         if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
> - write_commit_graph_reachable(get_object_directory(), 0, 1);
> + write_commit_graph_reachable(get_object_directory(), 0, 0);
>
>         rerere(0);
>         run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
> --
> 2.19.0

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-21 21:43     ` Junio C Hamano
@ 2018-09-21 21:57       ` Junio C Hamano
  0 siblings, 0 replies; 88+ messages in thread
From: Junio C Hamano @ 2018-09-21 21:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Eric Sunshine, Nguyễn Thái Ngọc Duy

Junio C Hamano <gitster@pobox.com> writes:

>> The above prototype change seems to have created a semantic conflict
>> with ds/commit-graph-tests (859fdc "commit-graph: define
>> GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we
>> call write_commit_graph_reachable() but the final parameter was
>> resolved to be "1" instead of "0".
>
> Hmph.  That's unfortunate.
>
> Perhaps one of the topics should have yielded and waited until the
> other one passes through.

Nah, I see where things went wrong.  I'll queue a single-liner
"mismerge fix" to 'next', and then correct the seed for the evil
merge kept in merge-fix/ab/commit-graph-progress, and rebuild 'pu'.
Things will straighten out by themselves after that happens.

Thanks for noticing.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-10-10 20:37       ` SZEDER Gábor
  2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
  2018-10-12  6:09         ` Junio C Hamano
  2018-10-15 16:54       ` SZEDER Gábor
  1 sibling, 2 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-10-10 20:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:
>     $ git -c gc.writeCommitGraph=true gc
>     [...]
>     Annotating commits in commit graph: 1565573, done.
>     Computing commit graph generation numbers: 100% (782484/782484), done.

While poking around 'commit-graph.c' in my Bloom filter experiment, I
saw similar numbers like above, and was confused by the much higher
than expected number of annotated commits.  It's about twice as much
as the number of commits in the repository, or the number shown on the
very next line.

> diff --git a/commit-graph.c b/commit-graph.c
> index 8a1bec7b8a..2c5d996194 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> -static void close_reachable(struct packed_oid_list *oids)
> +static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  {
>  	int i;
>  	struct commit *commit;
> +	struct progress *progress = NULL;
> +	int j = 0;
>  
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Annotating commits in commit graph"), 0);
>  	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  		if (commit)
>  			commit->object.flags |= UNINTERESTING;
> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>  	 * closure.
>  	 */
>  	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  
>  		if (commit && !parse_commit(commit))
> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>  	}

The above loops have already counted all the commits, and, more
importantly, did all the hard work that takes time and makes the
progress indicator useful.

>  	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);

This display_progress() call, however, doesn't seem to be necessary.
First, it counts all commits for a second time, resulting in the ~2x
difference compared to the actual number of commits, and then causing
my confusion.  Second, all what this loop is doing is setting a flag
in commits that were already looked up and parsed in the above loops.
IOW this loop is very fast, and the progress indicator jumps from
~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
progress indicator at all.

>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  
>  		if (commit)
>  			commit->object.flags &= ~UNINTERESTING;
>  	}
> +	stop_progress(&progress);
>  }

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 20:37       ` SZEDER Gábor
@ 2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
  2018-10-10 22:19           ` SZEDER Gábor
  2018-10-12  6:09         ` Junio C Hamano
  1 sibling, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-10 21:56 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:
>>     $ git -c gc.writeCommitGraph=true gc
>>     [...]
>>     Annotating commits in commit graph: 1565573, done.
>>     Computing commit graph generation numbers: 100% (782484/782484), done.
>
> While poking around 'commit-graph.c' in my Bloom filter experiment, I
> saw similar numbers like above, and was confused by the much higher
> than expected number of annotated commits.  It's about twice as much
> as the number of commits in the repository, or the number shown on the
> very next line.
>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 8a1bec7b8a..2c5d996194 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> -static void close_reachable(struct packed_oid_list *oids)
>> +static void close_reachable(struct packed_oid_list *oids, int report_progress)
>>  {
>>  	int i;
>>  	struct commit *commit;
>> +	struct progress *progress = NULL;
>> +	int j = 0;
>>
>> +	if (report_progress)
>> +		progress = start_delayed_progress(
>> +			_("Annotating commits in commit graph"), 0);
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>  		if (commit)
>>  			commit->object.flags |= UNINTERESTING;
>> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>>  	 * closure.
>>  	 */
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>
>>  		if (commit && !parse_commit(commit))
>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>>  	}
>
> The above loops have already counted all the commits, and, more
> importantly, did all the hard work that takes time and makes the
> progress indicator useful.
>
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);

[...]

> This display_progress() call, however, doesn't seem to be necessary.
> First, it counts all commits for a second time, resulting in the ~2x
> difference compared to the actual number of commits, and then causing
> my confusion.  Second, all what this loop is doing is setting a flag
> in commits that were already looked up and parsed in the above loops.
> IOW this loop is very fast, and the progress indicator jumps from
> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> progress indicator at all.

You're right, I tried this patch on top:

    diff --git a/commit-graph.c b/commit-graph.c
    index a686758603..cccd83de72 100644
    --- a/commit-graph.c
    +++ b/commit-graph.c
    @@ -655,12 +655,16 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
     		if (commit)
     			commit->object.flags |= UNINTERESTING;
     	}
    +	stop_progress(&progress); j = 0;

     	/*
     	 * As this loop runs, oids->nr may grow, but not more
     	 * than the number of missing commits in the reachable
     	 * closure.
     	 */
    +	if (report_progress)
    +		progress = start_delayed_progress(
    +			_("Annotating commits in commit graph 2"), 0);
     	for (i = 0; i < oids->nr; i++) {
     		display_progress(progress, ++j);
     		commit = lookup_commit(the_repository, &oids->list[i]);
    @@ -668,7 +672,11 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
     		if (commit && !parse_commit(commit))
     			add_missing_parents(oids, commit);
     	}
    +	stop_progress(&progress); j = 0;

    +	if (report_progress)
    +		progress = start_delayed_progress(
    +			_("Annotating commits in commit graph 3"), 0);
     	for (i = 0; i < oids->nr; i++) {
     		display_progress(progress, ++j);
     		commit = lookup_commit(the_repository, &oids->list[i]);

And on a large repo with around 3 million commits the 3rd progress bar
doesn't kick in.

But if I apply this on top:

    diff --git a/progress.c b/progress.c
    index 5a99c9fbf0..89cc705bf7 100644
    --- a/progress.c
    +++ b/progress.c
    @@ -58,8 +58,8 @@ static void set_progress_signal(void)
     	sa.sa_flags = SA_RESTART;
     	sigaction(SIGALRM, &sa, NULL);

    -	v.it_interval.tv_sec = 1;
    -	v.it_interval.tv_usec = 0;
    +	v.it_interval.tv_sec = 0;
    +	v.it_interval.tv_usec = 250000;
     	v.it_value = v.it_interval;
     	setitimer(ITIMER_REAL, &v, NULL);
     }

I.e. start the timer after 1/4 of a second instead of 1 second, I get
that progress bar.

So I'm inclined to keep it. It just needs to be 4x the size before it's
noticeably hanging for 1 second.

That repo isn't all that big compared to what we've heard about out
there, and inner loops like this have a tendency to accumulate some more
code over time without a re-visit of why we weren't monitoring progress
there.

But maybe we can fix the message. We say "Annotating commits in commit
grap", not "Counting" or whatever. I was trying to find something that
didn't imply that we were doing this once. One can annotate a thing more
than once, but maybe ther's a better way to explain this...

We had some more accurate progress reporting in close_reachable(),
discussed in
https://public-inbox.org/git/87efe5qqks.fsf@evledraar.gmail.com/ I still
think the *main* use-case for these things is to just report that we're
not hanging, so maybe the proper solution is to pick up Duy's patch to
display a spinner insted of a numeric progress.

>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>
>>  		if (commit)
>>  			commit->object.flags &= ~UNINTERESTING;
>>  	}
>> +	stop_progress(&progress);
>>  }

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
@ 2018-10-10 22:19           ` SZEDER Gábor
  2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 88+ messages in thread
From: SZEDER Gábor @ 2018-10-10 22:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Oct 10 2018, SZEDER Gábor wrote:

> >>  	for (i = 0; i < oids->nr; i++) {
> >> +		display_progress(progress, ++j);
> 
> [...]
> 
> > This display_progress() call, however, doesn't seem to be necessary.
> > First, it counts all commits for a second time, resulting in the ~2x
> > difference compared to the actual number of commits, and then causing
> > my confusion.  Second, all what this loop is doing is setting a flag
> > in commits that were already looked up and parsed in the above loops.
> > IOW this loop is very fast, and the progress indicator jumps from
> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> > progress indicator at all.
> 
> You're right, I tried this patch on top:

[...] 

> And on a large repo with around 3 million commits the 3rd progress bar
> doesn't kick in.
> 
> But if I apply this on top:
> 
[...]
> 
> I.e. start the timer after 1/4 of a second instead of 1 second, I get
> that progress bar.
> 
> So I'm inclined to keep it. It just needs to be 4x the size before it's
> noticeably hanging for 1 second.

Just to clarify: are you worried about a 1 second hang in an approx. 12
million commit repository?  If so, then I'm unconvinced, that's not
even a blip on the radar, and the misleading numbers are far worse.

> That repo isn't all that big compared to what we've heard about out
> there, and inner loops like this have a tendency to accumulate some more
> code over time without a re-visit of why we weren't monitoring progress
> there.
> 
> But maybe we can fix the message. We say "Annotating commits in commit
> grap", not "Counting" or whatever. I was trying to find something that
> didn't imply that we were doing this once. One can annotate a thing more
> than once, but maybe ther's a better way to explain this...

IMO just remove it.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 22:19           ` SZEDER Gábor
@ 2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
  2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-10 22:37 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>
>> >>  	for (i = 0; i < oids->nr; i++) {
>> >> +		display_progress(progress, ++j);
>>
>> [...]
>>
>> > This display_progress() call, however, doesn't seem to be necessary.
>> > First, it counts all commits for a second time, resulting in the ~2x
>> > difference compared to the actual number of commits, and then causing
>> > my confusion.  Second, all what this loop is doing is setting a flag
>> > in commits that were already looked up and parsed in the above loops.
>> > IOW this loop is very fast, and the progress indicator jumps from
>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>> > progress indicator at all.
>>
>> You're right, I tried this patch on top:
>
> [...]
>
>> And on a large repo with around 3 million commits the 3rd progress bar
>> doesn't kick in.
>>
>> But if I apply this on top:
>>
> [...]
>>
>> I.e. start the timer after 1/4 of a second instead of 1 second, I get
>> that progress bar.
>>
>> So I'm inclined to keep it. It just needs to be 4x the size before it's
>> noticeably hanging for 1 second.
>
> Just to clarify: are you worried about a 1 second hang in an approx. 12
> million commit repository?  If so, then I'm unconvinced, that's not
> even a blip on the radar, and the misleading numbers are far worse.

It's not a blip on the runtime, but the point of these progress bars in
general is so we don't have a UI where there's no UI differnce between
git hanging and just doing work in some tight loop in the background,
and even 1 second when you're watching something is noticeable if it
stalls.

Also it's 1 second on a server where I had 128G of RAM. I think even a
"trivial" flag change like this would very much change if e.g. the
system was under memory pressure or was swapping.

And as noted code like this tends to change over time, that loop might
get more expensive, so let's future proof by having all the loops over N
call the progress code.

When I wrote this the intent was just "report progress". So that it's
counting anything is just an implementation detail of how progress.c
works now.

This was the reference to Duy's patch, i.e. instead of spewing numbers
at the user here let's just render a spinner. Then we no longer need to
make judgement calls about which loop over N is expensive right now, and
which one isn't, and if any of them will result in reporting a 2N number
while the user might be more familiar with or expecting N.

>> That repo isn't all that big compared to what we've heard about out
>> there, and inner loops like this have a tendency to accumulate some more
>> code over time without a re-visit of why we weren't monitoring progress
>> there.
>>
>> But maybe we can fix the message. We say "Annotating commits in commit
>> grap", not "Counting" or whatever. I was trying to find something that
>> didn't imply that we were doing this once. One can annotate a thing more
>> than once, but maybe ther's a better way to explain this...
>
> IMO just remove it.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
@ 2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
  2018-10-15 16:05                 ` SZEDER Gábor
  0 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-11 17:52 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>
>> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>>
>>> >>  	for (i = 0; i < oids->nr; i++) {
>>> >> +		display_progress(progress, ++j);
>>>
>>> [...]
>>>
>>> > This display_progress() call, however, doesn't seem to be necessary.
>>> > First, it counts all commits for a second time, resulting in the ~2x
>>> > difference compared to the actual number of commits, and then causing
>>> > my confusion.  Second, all what this loop is doing is setting a flag
>>> > in commits that were already looked up and parsed in the above loops.
>>> > IOW this loop is very fast, and the progress indicator jumps from
>>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>>> > progress indicator at all.
>>>
>>> You're right, I tried this patch on top:
>>
>> [...]
>>
>>> And on a large repo with around 3 million commits the 3rd progress bar
>>> doesn't kick in.
>>>
>>> But if I apply this on top:
>>>
>> [...]
>>>
>>> I.e. start the timer after 1/4 of a second instead of 1 second, I get
>>> that progress bar.
>>>
>>> So I'm inclined to keep it. It just needs to be 4x the size before it's
>>> noticeably hanging for 1 second.
>>
>> Just to clarify: are you worried about a 1 second hang in an approx. 12
>> million commit repository?  If so, then I'm unconvinced, that's not
>> even a blip on the radar, and the misleading numbers are far worse.
>
> It's not a blip on the runtime, but the point of these progress bars in
> general is so we don't have a UI where there's no UI differnce between
> git hanging and just doing work in some tight loop in the background,
> and even 1 second when you're watching something is noticeable if it
> stalls.
>
> Also it's 1 second on a server where I had 128G of RAM. I think even a
> "trivial" flag change like this would very much change if e.g. the
> system was under memory pressure or was swapping.
>
> And as noted code like this tends to change over time, that loop might
> get more expensive, so let's future proof by having all the loops over N
> call the progress code.
>
> When I wrote this the intent was just "report progress". So that it's
> counting anything is just an implementation detail of how progress.c
> works now.
>
> This was the reference to Duy's patch, i.e. instead of spewing numbers
> at the user here let's just render a spinner. Then we no longer need to
> make judgement calls about which loop over N is expensive right now, and
> which one isn't, and if any of them will result in reporting a 2N number
> while the user might be more familiar with or expecting N.
>
>>> That repo isn't all that big compared to what we've heard about out
>>> there, and inner loops like this have a tendency to accumulate some more
>>> code over time without a re-visit of why we weren't monitoring progress
>>> there.
>>>
>>> But maybe we can fix the message. We say "Annotating commits in commit
>>> grap", not "Counting" or whatever. I was trying to find something that
>>> didn't imply that we were doing this once. One can annotate a thing more
>>> than once, but maybe ther's a better way to explain this...
>>
>> IMO just remove it.

Hrm, actually reading this again your initial post says we end up with a
2x difference v.s. the number of commits, but it's actually 3x. The loop
that has a rather trivial runtime comparatively is the 3x, but the 2x
loop takes a notable amount of time. So e.g. on git.git:

    $ git rev-list --all | wc -l; ~/g/git/git commit-graph write
    166678
    Annotating commits in commit graph: 518463, done.
    Computing commit graph generation numbers: 100% (172685/172685), done.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 20:37       ` SZEDER Gábor
  2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
@ 2018-10-12  6:09         ` Junio C Hamano
  2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 88+ messages in thread
From: Junio C Hamano @ 2018-10-12  6:09 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, git, Derrick Stolee,
	Jeff King, Eric Sunshine, Nguyễn Thái Ngọc Duy

SZEDER Gábor <szeder.dev@gmail.com> writes:

>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>  
>>  		if (commit && !parse_commit(commit))
>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>>  	}
>
> The above loops have already counted all the commits, and, more
> importantly, did all the hard work that takes time and makes the
> progress indicator useful.
>
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>
> This display_progress() call, however, doesn't seem to be necessary.
> First, it counts all commits for a second time, resulting in the ~2x
> difference compared to the actual number of commits, and then causing
> my confusion.  Second, all what this loop is doing is setting a flag
> in commits that were already looked up and parsed in the above loops.
> IOW this loop is very fast, and the progress indicator jumps from
> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> progress indicator at all.

Makes sense.  If this second iteration were also time consuming,
then it probably is a good idea to split these into two separate
phases?  "Counting 1...N" followed by "Inspecting 1...N" or
something like that.  Of course, if the latter does not take much
time, then doing without any progress indicator is also fine.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-12  6:09         ` Junio C Hamano
@ 2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
  2018-10-12 15:12             ` Derrick Stolee
  0 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-12 15:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Fri, Oct 12 2018, Junio C Hamano wrote:

> SZEDER Gábor <szeder.dev@gmail.com> writes:
>
>>>  	for (i = 0; i < oids->nr; i++) {
>>> +		display_progress(progress, ++j);
>>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>>
>>>  		if (commit && !parse_commit(commit))
>>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>>>  	}
>>
>> The above loops have already counted all the commits, and, more
>> importantly, did all the hard work that takes time and makes the
>> progress indicator useful.
>>
>>>  	for (i = 0; i < oids->nr; i++) {
>>> +		display_progress(progress, ++j);
>>
>> This display_progress() call, however, doesn't seem to be necessary.
>> First, it counts all commits for a second time, resulting in the ~2x
>> difference compared to the actual number of commits, and then causing
>> my confusion.  Second, all what this loop is doing is setting a flag
>> in commits that were already looked up and parsed in the above loops.
>> IOW this loop is very fast, and the progress indicator jumps from
>> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>> progress indicator at all.
>
> Makes sense.  If this second iteration were also time consuming,
> then it probably is a good idea to split these into two separate
> phases?  "Counting 1...N" followed by "Inspecting 1...N" or
> something like that.  Of course, if the latter does not take much
> time, then doing without any progress indicator is also fine.

That's a good point. Derrick: If the three loops in close_reachable()
had to be split up into different progress stages and given different
names what do you think they should be? Now it's "Annotating commits in
commit graph" for all of them.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
@ 2018-10-12 15:12             ` Derrick Stolee
  0 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-10-12 15:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano
  Cc: SZEDER Gábor, git, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On 10/12/2018 11:07 AM, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Oct 12 2018, Junio C Hamano wrote:
>
>> Makes sense. If this second iteration were also time consuming,
>> then it probably is a good idea to split these into two separate
>> phases?  "Counting 1...N" followed by "Inspecting 1...N" or
>> something like that.  Of course, if the latter does not take much
>> time, then doing without any progress indicator is also fine.
> That's a good point. Derrick: If the three loops in close_reachable()
> had to be split up into different progress stages and given different
> names what do you think they should be? Now it's "Annotating commits in
> commit graph" for all of them.

The following is the best I can think of right now:

1. Loading known commits.
2. Expanding reachable commits.
3. Clearing commit marks.

-Stolee

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
@ 2018-10-15 16:05                 ` SZEDER Gábor
  0 siblings, 0 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-10-15 16:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Thu, Oct 11, 2018 at 07:52:21PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Wed, Oct 10 2018, SZEDER Gábor wrote:
> >
> >> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
> >>
> >>> >>  	for (i = 0; i < oids->nr; i++) {
> >>> >> +		display_progress(progress, ++j);
> >>>
> >>> [...]
> >>>
> >>> > This display_progress() call, however, doesn't seem to be necessary.
> >>> > First, it counts all commits for a second time, resulting in the ~2x
> >>> > difference compared to the actual number of commits, and then causing
> >>> > my confusion.  Second, all what this loop is doing is setting a flag
> >>> > in commits that were already looked up and parsed in the above loops.
> >>> > IOW this loop is very fast, and the progress indicator jumps from
> >>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> >>> > progress indicator at all.

> Hrm, actually reading this again your initial post says we end up with a
> 2x difference v.s. the number of commits, but it's actually 3x.

Well, it depends on how you create the commit-graph and on the repo as
well, I guess.  I run 'git commit-graph write --reachable' in a repo
created by 'git clone --single-branch ...', and in that case the
difference is only ~2x (the first loop in close_reachable() has as
many iterations as the number of refs).  If the repo were to contain
twice as many refs as commits, then the difference could be as high as
4x.

However, I think I might have noticed an other progress counting issue
as well, will get back to it later but first I have to get my numbers
straight.

> The loop
> that has a rather trivial runtime comparatively is the 3x, but the 2x
> loop takes a notable amount of time. So e.g. on git.git:
> 
>     $ git rev-list --all | wc -l; ~/g/git/git commit-graph write
>     166678
>     Annotating commits in commit graph: 518463, done.
>     Computing commit graph generation numbers: 100% (172685/172685), done.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-10-10 20:37       ` SZEDER Gábor
@ 2018-10-15 16:54       ` SZEDER Gábor
  2018-11-19 16:02         ` SZEDER Gábor
  1 sibling, 1 reply; 88+ messages in thread
From: SZEDER Gábor @ 2018-10-15 16:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:

> @@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
>  	off_t offset = nth_packed_object_offset(pack, pos);
>  	struct object_info oi = OBJECT_INFO_INIT;
>  
> +	if (list->progress)
> +		display_progress(list->progress, ++list->progress_done);

Note that add_packed_commits() is used as a callback function for
for_each_object_in_pack() (with '--stdin-packs') or
for_each_packed_object() (no options), i.e. this will count the number
of objects, not commits:

  $ git rev-list --all |wc -l
  768524
  $ git rev-list --objects --all |wc -l
  6130295
  # '--count --objects' together didn't work as expected.
  $ time ~/src/git/git commit-graph write
  Finding commits for commit graph: 6130295, done.
  Annotating commits in commit graph: 2305572, done.
  Computing commit graph generation numbers: 100% (768524/768524), done.

(Now I also see the 3x difference in the "Annotating commits" counter
that you mentioned.)

I see two options:

  - Provide a different title for this progress counter, e.g.
    "Scanning objects for c-g", or "Processing objects...", or
    something else that says "objects" instead of "commits".

  - Move this condition and display_progress() call to the end of the
    function, so it will only count commits, not any other objects.
    (As far as I understand both for_each_object_in_pack() and
    for_each_packed_object() iterate in pack .idx order, i.e. it's
    essentially random.  This means that commit objects should be
    distributed evenly among other kinds of objects, so we don't have
    to worry about the counter stalling for a long stretch of
    consecutive non-commit objects.  At least in theory.)




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-15 16:54       ` SZEDER Gábor
@ 2018-11-19 16:02         ` SZEDER Gábor
  2018-11-19 20:23           ` [PATCH] commit-graph: split up close_reachable() " Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-19 16:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

Ping?

We are at -rc0, this progress output is a new feature since v2.19.0,
and the numbers shown are still way off.


On Mon, Oct 15, 2018 at 06:54:47PM +0200, SZEDER Gábor wrote:
> On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:
> 
> > @@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
> >  	off_t offset = nth_packed_object_offset(pack, pos);
> >  	struct object_info oi = OBJECT_INFO_INIT;
> >  
> > +	if (list->progress)
> > +		display_progress(list->progress, ++list->progress_done);
> 
> Note that add_packed_commits() is used as a callback function for
> for_each_object_in_pack() (with '--stdin-packs') or
> for_each_packed_object() (no options), i.e. this will count the number
> of objects, not commits:
> 
>   $ git rev-list --all |wc -l
>   768524
>   $ git rev-list --objects --all |wc -l
>   6130295
>   # '--count --objects' together didn't work as expected.
>   $ time ~/src/git/git commit-graph write
>   Finding commits for commit graph: 6130295, done.
>   Annotating commits in commit graph: 2305572, done.
>   Computing commit graph generation numbers: 100% (768524/768524), done.
> 
> (Now I also see the 3x difference in the "Annotating commits" counter
> that you mentioned.)
> 
> I see two options:
> 
>   - Provide a different title for this progress counter, e.g.
>     "Scanning objects for c-g", or "Processing objects...", or
>     something else that says "objects" instead of "commits".
> 
>   - Move this condition and display_progress() call to the end of the
>     function, so it will only count commits, not any other objects.
>     (As far as I understand both for_each_object_in_pack() and
>     for_each_packed_object() iterate in pack .idx order, i.e. it's
>     essentially random.  This means that commit objects should be
>     distributed evenly among other kinds of objects, so we don't have
>     to worry about the counter stalling for a long stretch of
>     consecutive non-commit objects.  At least in theory.)
> 
> 
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH] commit-graph: split up close_reachable() progress output
  2018-11-19 16:02         ` SZEDER Gábor
@ 2018-11-19 20:23           ` " Ævar Arnfjörð Bjarmason
  2018-11-19 20:38             ` Derrick Stolee
  2018-11-19 22:57             ` SZEDER Gábor
  0 siblings, 2 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-19 20:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

Amend the progress output added in 7b0f229222 ("commit-graph write:
add progress output", 2018-09-17) so that the total numbers it reports
aren't higher than the total number of commits anymore. See [1] for a
bug report pointing that out.

When I added this I wasn't intending to provide an accurate count, but
just have some progress output to show the user the command wasn't
hanging[2]. But since we are showing numbers, let's make them
accurate. The progress descriptions were suggested by Derrick Stolee
in [3].

As noted in [2] we are unlikely to show anything except the "Expanding
reachable..." message even on fairly large repositories such as
linux.git. On a test repository I have with north of 7 million commits
all of these are displayed. Two of them don't show up for long, but as
noted in [5] future-proofing this for if the loops become more
expensive in the future makes sense.

1. https://public-inbox.org/git/20181010203738.GE23446@szeder.dev/
2. https://public-inbox.org/git/87pnwhea8y.fsf@evledraar.gmail.com/
3. https://public-inbox.org/git/f7a0cbee-863c-61d3-4959-5cec8b43c705@gmail.com/
4. https://public-inbox.org/git/20181015160545.GG19800@szeder.dev/
5. https://public-inbox.org/git/87murle8da.fsf@evledraar.gmail.com/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Mon, Nov 19 2018, SZEDER Gábor wrote:

> Ping?
>
> We are at -rc0, this progress output is a new feature since v2.19.0,
> and the numbers shown are still way off.

I was under the impression after your
https://public-inbox.org/git/20181015160545.GG19800@szeder.dev/ that
you were going to do some more digging & report back, so I put it on
my "waiting for feedback" list and then forgot about it.

But here's a patch that should address the issue you pointed out, but
I don't know if it fixes whatever you were alluding to in the linked
E-Mail above.

 commit-graph.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..9c0d6914be 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -641,26 +641,29 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 
 static void close_reachable(struct packed_oid_list *oids, int report_progress)
 {
-	int i;
+	int i, j;
 	struct commit *commit;
 	struct progress *progress = NULL;
-	int j = 0;
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commits in commit graph"), 0);
+			_("Loading known commits in commit graph"), j = 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
 	}
+	stop_progress(&progress);
 
 	/*
 	 * As this loop runs, oids->nr may grow, but not more
 	 * than the number of missing commits in the reachable
 	 * closure.
 	 */
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Expanding reachable commits in commit graph"), j = 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
@@ -668,7 +671,11 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 		if (commit && !parse_commit(commit))
 			add_missing_parents(oids, commit);
 	}
+	stop_progress(&progress);
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Clearing commit marks in commit graph"), j = 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.19.1.1182.g4ecb1133ce


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH] commit-graph: split up close_reachable() progress output
  2018-11-19 20:23           ` [PATCH] commit-graph: split up close_reachable() " Ævar Arnfjörð Bjarmason
@ 2018-11-19 20:38             ` Derrick Stolee
  2018-11-19 22:57             ` SZEDER Gábor
  1 sibling, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-11-19 20:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine

On 11/19/2018 3:23 PM, Ævar Arnfjörð Bjarmason wrote:
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Expanding reachable commits in commit graph"), j = 0);

This should be the only one that shows up in all but the very largest of 
repositories.

LGTM.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH] commit-graph: split up close_reachable() progress output
  2018-11-19 20:23           ` [PATCH] commit-graph: split up close_reachable() " Ævar Arnfjörð Bjarmason
  2018-11-19 20:38             ` Derrick Stolee
@ 2018-11-19 22:57             ` SZEDER Gábor
  2018-11-20 15:04               ` [PATCH 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                 ` (6 more replies)
  1 sibling, 7 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-19 22:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King,
	Nguyễn Thái Ngọc Duy, Eric Sunshine,
	Derrick Stolee

On Mon, Nov 19, 2018 at 08:23:00PM +0000, Ævar Arnfjörð Bjarmason wrote:
> Amend the progress output added in 7b0f229222 ("commit-graph write:
> add progress output", 2018-09-17) so that the total numbers it reports
> aren't higher than the total number of commits anymore. See [1] for a
> bug report pointing that out.

Please make the commit message more self-contained, i.e. describe the
issue this patch fixes in more detail, so readers won't have to follow
links to understand the problem.

> When I added this I wasn't intending to provide an accurate count, but
> just have some progress output to show the user the command wasn't
> hanging[2]. But since we are showing numbers, let's make them
> accurate. The progress descriptions were suggested by Derrick Stolee
> in [3].
> 
> As noted in [2] we are unlikely to show anything except the "Expanding
> reachable..." message even on fairly large repositories such as
> linux.git. On a test repository I have with north of 7 million commits
> all of these are displayed. Two of them don't show up for long, but as
> noted in [5] future-proofing this for if the loops become more
> expensive in the future makes sense.

In my opinion this is rather one of those "we'll cross that bridge
when (or if ever) we get there" situations.

> 1. https://public-inbox.org/git/20181010203738.GE23446@szeder.dev/
> 2. https://public-inbox.org/git/87pnwhea8y.fsf@evledraar.gmail.com/
> 3. https://public-inbox.org/git/f7a0cbee-863c-61d3-4959-5cec8b43c705@gmail.com/
> 4. https://public-inbox.org/git/20181015160545.GG19800@szeder.dev/
> 5. https://public-inbox.org/git/87murle8da.fsf@evledraar.gmail.com/
> 
> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> 
> On Mon, Nov 19 2018, SZEDER Gábor wrote:
> 
> > Ping?
> >
> > We are at -rc0, this progress output is a new feature since v2.19.0,
> > and the numbers shown are still way off.
> 
> I was under the impression after your
> https://public-inbox.org/git/20181015160545.GG19800@szeder.dev/ that
> you were going to do some more digging & report back, so I put it on
> my "waiting for feedback" list and then forgot about it.

No, after I managed to "get my numbers straight" I sent another bug
report in

  https://public-inbox.org/git/20181015165447.GH19800@szeder.dev/

but as a reply to your original patch.  Sorry about the confusion.

> But here's a patch that should address the issue you pointed out, but
> I don't know if it fixes whatever you were alluding to in the linked
> E-Mail above.

I'm afraid this patch doesn't address that issue, as it's limited to
close_reachable(), and that issue is related to the progress output in
add_packed_commits().

>  commit-graph.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 40c855f185..9c0d6914be 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -641,26 +641,29 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
>  
>  static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  {
> -	int i;
> +	int i, j;
>  	struct commit *commit;
>  	struct progress *progress = NULL;
> -	int j = 0;
>  
>  	if (report_progress)
>  		progress = start_delayed_progress(
> -			_("Annotating commits in commit graph"), 0);
> +			_("Loading known commits in commit graph"), j = 0);
>  	for (i = 0; i < oids->nr; i++) {
>  		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  		if (commit)
>  			commit->object.flags |= UNINTERESTING;
>  	}
> +	stop_progress(&progress);
>  
>  	/*
>  	 * As this loop runs, oids->nr may grow, but not more
>  	 * than the number of missing commits in the reachable
>  	 * closure.
>  	 */
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Expanding reachable commits in commit graph"), j = 0);
>  	for (i = 0; i < oids->nr; i++) {
>  		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
> @@ -668,7 +671,11 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  		if (commit && !parse_commit(commit))
>  			add_missing_parents(oids, commit);
>  	}
> +	stop_progress(&progress);
>  
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Clearing commit marks in commit graph"), j = 0);
>  	for (i = 0; i < oids->nr; i++) {
>  		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
> -- 
> 2.19.1.1182.g4ecb1133ce
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 0/6] commit-graph write: progress output improvements
  2018-11-19 22:57             ` SZEDER Gábor
@ 2018-11-20 15:04               ` Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 1/6] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
                                 ` (5 subsequent siblings)
  6 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

This replaces my "commit-graph: split up close_reachable() progress
output". We could still do something like that, but I think this makes
more sense, and also plugs some missing holes in the progress
output. See 6/6 for what the end-state is.

I believe this addresses SZEDER Gábor's concerns (thanks
b.t.w.!). I.e. now it should be clear to the user at each step if
we're counting objects, or just as in the case of close_reachable()
doing some X amount of work without any particular relation to the
number of objects or commits.

Ævar Arnfjörð Bjarmason (6):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add more progress output
  commit-graph write: show progress for object search
  commit-graph write: add more describing progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add even more progress output

 commit-graph.c | 98 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 79 insertions(+), 19 deletions(-)

-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 1/6] commit-graph write: rephrase confusing progress output
  2018-11-19 22:57             ` SZEDER Gábor
  2018-11-20 15:04               ` [PATCH 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
@ 2018-11-20 15:04               ` Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
                                 ` (4 subsequent siblings)
  6 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.GH19800@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..e6d0d7722b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commits in commit graph"), 0);
+			_("Annotating commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 2/6] commit-graph write: add more progress output
  2018-11-19 22:57             ` SZEDER Gábor
  2018-11-20 15:04               ` [PATCH 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 1/6] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
@ 2018-11-20 15:04               ` " Ævar Arnfjörð Bjarmason
  2018-11-20 16:58                 ` SZEDER Gábor
  2018-11-20 15:04               ` [PATCH 3/6] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
                                 ` (3 subsequent siblings)
  6 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add more progress output to the output already added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 6418991, done.
    Computing commit graph generation numbers: 100% (797205/797205), done.
    Writing out commit graph chunks: 2399861, done.

This "graph chunks" number is not meant to be meaningful to the user,
but just to show that we're doing work and the command isn't
hanging.

On a much larger in-house repository I have we'll show (note how we
also say "Annotating[...]"):

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 48271163, done.
    Annotating commit graph: 21424536, done.
    Computing commit graph generation numbers: 100% (7141512/7141512), done.
    Writing out commit graph chunks: 21424913, done.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 47 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e6d0d7722b..afce20dd4d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
-				     int nr_commits)
+				     int nr_commits,
+				     struct progress *progress,
+				     uint64_t *progress_cnt)
 {
 	int i, count = 0;
 	struct commit **list = commits;
@@ -445,6 +447,8 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 	 */
 	for (i = 0; i < 256; i++) {
 		while (count < nr_commits) {
+			if (progress)
+				display_progress(progress, ++*progress_cnt);
 			if ((*list)->object.oid.hash[0] != i)
 				break;
 			count++;
@@ -456,12 +460,17 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	int count;
-	for (count = 0; count < nr_commits; count++, list++)
+	for (count = 0; count < nr_commits; count++, list++) {
+		if (progress)
+			display_progress(progress, ++*progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+	}
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +480,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -482,6 +493,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		int edge_value;
 		uint32_t packedDate[2];
 
+		if (progress)
+			display_progress(progress, ++*progress_cnt);
+
 		parse_commit(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
 
@@ -542,7 +556,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
 					  struct commit **commits,
-					  int nr_commits)
+					  int nr_commits,
+					  struct progress *progress,
+					  uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -566,6 +582,9 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
 						  nr_commits,
 						  commit_to_sha1);
 
+			if (progress)
+				display_progress(progress, ++*progress_cnt);
+
 			if (edge_value < 0)
 				edge_value = GRAPH_PARENT_MISSING;
 			else if (!parent->next)
@@ -764,6 +783,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_extra_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	uint64_t progress_cnt;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -937,10 +957,19 @@ void write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Writing out commit graph chunks"),
+			progress_cnt = 0);
+	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
+				 &progress_cnt);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,
+			       progress, &progress_cnt);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr,
+			       progress, &progress_cnt);
+	write_graph_chunk_large_edges(f, commits.list, commits.nr, progress,
+				      &progress_cnt);
+	stop_progress(&progress);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 3/6] commit-graph write: show progress for object search
  2018-11-19 22:57             ` SZEDER Gábor
                                 ` (2 preceding siblings ...)
  2018-11-20 15:04               ` [PATCH 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
@ 2018-11-20 15:04               ` Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 4/6] commit-graph write: add more describing progress output Ævar Arnfjörð Bjarmason
                                 ` (2 subsequent siblings)
  6 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

    Finding commits for commit graph: 6418991, done.
    [...]

And now:

    Finding commits for commit graph: 100% (6418991/6418991), done.
    [...]

Since the commit graph only includes those commits that are
packed (via for_each_packed_object(...)) the
approximate_object_count() returns the actual number of objects we're
going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index afce20dd4d..4d03f8aa7f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -784,12 +784,14 @@ void write_commit_graph(const char *obj_dir,
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 	uint64_t progress_cnt;
+	unsigned long approx_nr_objects;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
 
 	oids.nr = 0;
-	oids.alloc = approximate_object_count() / 32;
+	approx_nr_objects = approximate_object_count();
+	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
 
@@ -869,8 +871,11 @@ void write_commit_graph(const char *obj_dir,
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+				_("Finding commits for commit graph"),
+				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		if (oids.progress_done < approx_nr_objects)
+			display_progress(oids.progress, approx_nr_objects);
 		stop_progress(&oids.progress);
 	}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 4/6] commit-graph write: add more describing progress output
  2018-11-19 22:57             ` SZEDER Gábor
                                 ` (3 preceding siblings ...)
  2018-11-20 15:04               ` [PATCH 3/6] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
@ 2018-11-20 15:04               ` Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 5/6] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 6/6] commit-graph write: add even more progress output Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

    $ git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6418991/6418991), done.
    [...]
    $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits
    Finding commits for commit graph from 584 ref tips: 100% (584/584), done.
    [...]
    $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
    Finding commits for commit graph in 4 packs: 6418991, done.
    [...]

The middle on of those is going to be the output users will most
commonly see, since it'll be emitted when they get the commit graph
via gc.writeCommitGraph=true.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 4d03f8aa7f..a0aea850f1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -785,6 +785,7 @@ void write_commit_graph(const char *obj_dir,
 	struct progress *progress = NULL;
 	uint64_t progress_cnt;
 	unsigned long approx_nr_objects;
+	struct strbuf progress_title = STRBUF_INIT;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -821,8 +822,12 @@ void write_commit_graph(const char *obj_dir,
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
 		if (report_progress) {
-			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph in %d pack",
+				       "Finding commits for commit graph in %d packs",
+				       pack_indexes->nr),
+				    pack_indexes->nr);
+			oids.progress = start_delayed_progress(progress_title.buf, 0);
 			oids.progress_done = 0;
 		}
 		for (i = 0; i < pack_indexes->nr; i++) {
@@ -839,14 +844,20 @@ void write_commit_graph(const char *obj_dir,
 			free(p);
 		}
 		stop_progress(&oids.progress);
+		strbuf_reset(&progress_title);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
-		if (report_progress)
-			progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
-				commit_hex->nr);
+		if (report_progress) {
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph from %d ref tip",
+				       "Finding commits for commit graph from %d ref tips",
+				       commit_hex->nr),
+				    commit_hex->nr);
+			progress = start_delayed_progress(progress_title.buf,
+							  commit_hex->nr);
+		}
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
@@ -866,12 +877,13 @@ void write_commit_graph(const char *obj_dir,
 			}
 		}
 		stop_progress(&progress);
+		strbuf_reset(&progress_title);
 	}
 
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
+				_("Finding commits for commit graph among packed objects"),
 				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
 		if (oids.progress_done < approx_nr_objects)
@@ -976,6 +988,8 @@ void write_commit_graph(const char *obj_dir,
 				      &progress_cnt);
 	stop_progress(&progress);
 
+	strbuf_release(&progress_title);
+
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 5/6] commit-graph write: remove empty line for readability
  2018-11-19 22:57             ` SZEDER Gábor
                                 ` (4 preceding siblings ...)
  2018-11-20 15:04               ` [PATCH 4/6] commit-graph write: add more describing progress output Ævar Arnfjörð Bjarmason
@ 2018-11-20 15:04               ` Ævar Arnfjörð Bjarmason
  2018-11-20 15:04               ` [PATCH 6/6] commit-graph write: add even more progress output Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index a0aea850f1..d0961e89df 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -894,7 +894,6 @@ void write_commit_graph(const char *obj_dir,
 	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
-
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 6/6] commit-graph write: add even more progress output
  2018-11-19 22:57             ` SZEDER Gábor
                                 ` (5 preceding siblings ...)
  2018-11-20 15:04               ` [PATCH 5/6] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
@ 2018-11-20 15:04               ` Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 15:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add more progress output to sections of code that can collectively
take 5-10 seconds on a large enough repository. On a test repository
with 7141512 commits (see earlier patches for details) we'll now emit:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (50009986/50009986), done.
    Annotating commit graph: 21564240, done.
    Counting distinct commits in commit graph: 100% (7188080/7188080), done.
    Finding extra edges in commit graph: 100% (7188080/7188080), done.
    Computing commit graph generation numbers: 100% (7143635/7143635), done.
    Writing out commit graph chunks: 21431282, done.

Whereas on a medium-sized repository such as linux.git we'll still
emit output like:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365328/6365328), done.
    Annotating commit graph: 2391621, done.
    Computing commit graph generation numbers: 100% (797207/797207), done.
    Writing out commit graph chunks: 2399867, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index d0961e89df..2e2eaa24ca 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -893,12 +893,19 @@ void write_commit_graph(const char *obj_dir,
 
 	close_reachable(&oids, report_progress);
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Counting distinct commits in commit graph"),
+			oids.nr);
+	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
 	QSORT(oids.list, oids.nr, commit_compare);
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
+		display_progress(progress, i + 1);
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
 			count_distinct++;
 	}
+	stop_progress(&progress);
 
 	if (count_distinct >= GRAPH_PARENT_MISSING)
 		die(_("the commit graph format cannot write %d commits"), count_distinct);
@@ -908,8 +915,13 @@ void write_commit_graph(const char *obj_dir,
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
 	num_extra_edges = 0;
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Finding extra edges in commit graph"),
+			oids.nr);
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
+		display_progress(progress, i + 1);
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
 			continue;
 
@@ -926,6 +938,7 @@ void write_commit_graph(const char *obj_dir,
 		commits.nr++;
 	}
 	num_chunks = num_extra_edges ? 4 : 3;
+	stop_progress(&progress);
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/6] commit-graph write: add more progress output
  2018-11-20 15:04               ` [PATCH 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
@ 2018-11-20 16:58                 ` SZEDER Gábor
  2018-11-20 19:50                   ` [PATCH v2 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                     ` (7 more replies)
  0 siblings, 8 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-20 16:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King,
	Nguyễn Thái Ngọc Duy, Eric Sunshine,
	Derrick Stolee

On Tue, Nov 20, 2018 at 03:04:39PM +0000, Ævar Arnfjörð Bjarmason wrote:
> Add more progress output to the output already added in
> 7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
> 
> As noted in that commit most of the progress output isn't displayed on
> small repositories, but before this change we'd noticeably hang for
> 2-3 seconds at the end on medium sized repositories such as linux.git.
> 
> Now we'll instead show output like this, and have no human-observable
> point at which we're not producing progress output:
> 
>     $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
>     Finding commits for commit graph: 6418991, done.
>     Computing commit graph generation numbers: 100% (797205/797205), done.
>     Writing out commit graph chunks: 2399861, done.
> 
> This "graph chunks" number is not meant to be meaningful to the user,
> but just to show that we're doing work and the command isn't
> hanging.
> 
> On a much larger in-house repository I have we'll show (note how we
> also say "Annotating[...]"):
> 
>     $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
>     Finding commits for commit graph: 48271163, done.
>     Annotating commit graph: 21424536, done.
>     Computing commit graph generation numbers: 100% (7141512/7141512), done.
>     Writing out commit graph chunks: 21424913, done.

That's a lot of chunks, but according to the specs, there are only 3
or 4 chunks in a commit-graph file.  More on this below.

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  commit-graph.c | 47 ++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index e6d0d7722b..afce20dd4d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
>  
>  static void write_graph_chunk_fanout(struct hashfile *f,
>  				     struct commit **commits,
> -				     int nr_commits)
> +				     int nr_commits,
> +				     struct progress *progress,
> +				     uint64_t *progress_cnt)
>  {
>  	int i, count = 0;
>  	struct commit **list = commits;
> @@ -445,6 +447,8 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>  	 */
>  	for (i = 0; i < 256; i++) {
>  		while (count < nr_commits) {
> +			if (progress)
> +				display_progress(progress, ++*progress_cnt);

The condition is unnecessary, display_progress() is prepared to deal
with a NULL progress pointer.  The same applies to all such calls in
this patch.

>  			if ((*list)->object.oid.hash[0] != i)
>  				break;
>  			count++;
> @@ -456,12 +460,17 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>  }
>  
>  static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
> -				   struct commit **commits, int nr_commits)
> +				   struct commit **commits, int nr_commits,
> +				   struct progress *progress,
> +				   uint64_t *progress_cnt)
>  {
>  	struct commit **list = commits;
>  	int count;
> -	for (count = 0; count < nr_commits; count++, list++)
> +	for (count = 0; count < nr_commits; count++, list++) {
> +		if (progress)
> +			display_progress(progress, ++*progress_cnt);
>  		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
> +	}
>  }
>  
>  static const unsigned char *commit_to_sha1(size_t index, void *table)
> @@ -471,7 +480,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
>  }
>  
>  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> -				   struct commit **commits, int nr_commits)
> +				   struct commit **commits, int nr_commits,
> +				   struct progress *progress,
> +				   uint64_t *progress_cnt)
>  {
>  	struct commit **list = commits;
>  	struct commit **last = commits + nr_commits;
> @@ -482,6 +493,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		int edge_value;
>  		uint32_t packedDate[2];
>  
> +		if (progress)
> +			display_progress(progress, ++*progress_cnt);
> +
>  		parse_commit(*list);
>  		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
>  
> @@ -542,7 +556,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  
>  static void write_graph_chunk_large_edges(struct hashfile *f,
>  					  struct commit **commits,
> -					  int nr_commits)
> +					  int nr_commits,
> +					  struct progress *progress,
> +					  uint64_t *progress_cnt)
>  {
>  	struct commit **list = commits;
>  	struct commit **last = commits + nr_commits;
> @@ -566,6 +582,9 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
>  						  nr_commits,
>  						  commit_to_sha1);
>  
> +			if (progress)
> +				display_progress(progress, ++*progress_cnt);
> +
>  			if (edge_value < 0)
>  				edge_value = GRAPH_PARENT_MISSING;
>  			else if (!parent->next)
> @@ -764,6 +783,7 @@ void write_commit_graph(const char *obj_dir,
>  	int num_extra_edges;
>  	struct commit_list *parent;
>  	struct progress *progress = NULL;
> +	uint64_t progress_cnt;
>  
>  	if (!commit_graph_compatible(the_repository))
>  		return;
> @@ -937,10 +957,19 @@ void write_commit_graph(const char *obj_dir,
>  		hashwrite(f, chunk_write, 12);
>  	}
>  
> -	write_graph_chunk_fanout(f, commits.list, commits.nr);
> -	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
> -	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
> -	write_graph_chunk_large_edges(f, commits.list, commits.nr);
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Writing out commit graph chunks"),
> +			progress_cnt = 0);

First, this is an unusual place to set a variable.

Second, as mentioned above, there are only 3 or 4 chunks, therefore I
think this should only say "Writing out commit graph".

Finally, each of the write_graph_chunk_*() functions called below
iterate over all commits, so we know and thus can show the total in
advance.

So how about something like the patch below on top?  Note that I had
to shift two display_progress() calls a couple of lines, because
otherwise the numbers didn't add up.

Just to get you thinking and to have something to try out, but I saw a
bit of weirdness while at it, and want to look into it, but now I've
got to go...


> +	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
> +				 &progress_cnt);
> +	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,
> +			       progress, &progress_cnt);
> +	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr,
> +			       progress, &progress_cnt);
> +	write_graph_chunk_large_edges(f, commits.list, commits.nr, progress,
> +				      &progress_cnt);
> +	stop_progress(&progress);
>  
>  	close_commit_graph(the_repository);
>  	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);



diff --git a/commit-graph.c b/commit-graph.c
index 2e2eaa24ca..2f3417db32 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -447,10 +447,9 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 	 */
 	for (i = 0; i < 256; i++) {
 		while (count < nr_commits) {
-			if (progress)
-				display_progress(progress, ++*progress_cnt);
 			if ((*list)->object.oid.hash[0] != i)
 				break;
+			display_progress(progress, ++*progress_cnt);
 			count++;
 			list++;
 		}
@@ -467,8 +466,7 @@ static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
 	struct commit **list = commits;
 	int count;
 	for (count = 0; count < nr_commits; count++, list++) {
-		if (progress)
-			display_progress(progress, ++*progress_cnt);
+		display_progress(progress, ++*progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
 	}
 }
@@ -493,8 +491,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		int edge_value;
 		uint32_t packedDate[2];
 
-		if (progress)
-			display_progress(progress, ++*progress_cnt);
+		display_progress(progress, ++*progress_cnt);
 
 		parse_commit(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -570,6 +567,8 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
 		     parent = parent->next)
 			num_parents++;
 
+		display_progress(progress, ++*progress_cnt);
+
 		if (num_parents <= 2) {
 			list++;
 			continue;
@@ -582,9 +581,6 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
 						  nr_commits,
 						  commit_to_sha1);
 
-			if (progress)
-				display_progress(progress, ++*progress_cnt);
-
 			if (edge_value < 0)
 				edge_value = GRAPH_PARENT_MISSING;
 			else if (!parent->next)
@@ -986,10 +982,11 @@ void write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	if (report_progress)
+	if (report_progress) {
 		progress = start_delayed_progress(
-			_("Writing out commit graph chunks"),
-			progress_cnt = 0);
+			_("Writing out commit graph"), 4 * commits.nr);
+		progress_cnt = 0;
+	}
 	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
 				 &progress_cnt);
 	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 0/6] commit-graph write: progress output improvements
  2018-11-20 16:58                 ` SZEDER Gábor
@ 2018-11-20 19:50                   ` Ævar Arnfjörð Bjarmason
  2018-11-20 19:50                   ` [PATCH v2 1/6] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
                                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Fixes issues SZEDER raised with v1, except displaying an accurate ETA
in write_graph_*(). As noted in 2/6 I don't think it's worth it, I
just adjusted the message instead.

Ævar Arnfjörð Bjarmason (6):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add more progress output
  commit-graph write: show progress for object search
  commit-graph write: add more describing progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add even more progress output

 commit-graph.c | 92 +++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 73 insertions(+), 19 deletions(-)

Range-diff:
1:  751d3a7561 ! 1:  093c63e99f commit-graph write: add more progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -13,22 +13,30 @@
         point at which we're not producing progress output:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph: 6418991, done.
    -        Computing commit graph generation numbers: 100% (797205/797205), done.
    -        Writing out commit graph chunks: 2399861, done.
    +        Finding commits for commit graph: 6365492, done.
    +        Computing commit graph generation numbers: 100% (797222/797222), done.
    +        Writing out commit graph: 2399912, done.
     
    -    This "graph chunks" number is not meant to be meaningful to the user,
    +    This "writing out" number is not meant to be meaningful to the user,
         but just to show that we're doing work and the command isn't
         hanging.
     
    +    In the current implementation it's approximately 4x the number of
    +    commits. As noted in on-list discussion[1] we could add the loops up
    +    and show percentage progress here, but I don't think it's worth it. It
    +    would make the implementation more complex and harder to maintain for
    +    very little gain.
    +
         On a much larger in-house repository I have we'll show (note how we
         also say "Annotating[...]"):
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph: 48271163, done.
    -        Annotating commit graph: 21424536, done.
    -        Computing commit graph generation numbers: 100% (7141512/7141512), done.
    -        Writing out commit graph chunks: 21424913, done.
    +        Finding commits for commit graph: 50026015, done.
    +        Annotating commit graph: 21567407, done.
    +        Computing commit graph generation numbers: 100% (7144680/7144680), done.
    +        Writing out commit graph: 21434417, done.
    +
    +    1. https://public-inbox.org/git/20181120165800.GB30222@szeder.dev/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ -50,8 +58,7 @@
      	 */
      	for (i = 0; i < 256; i++) {
      		while (count < nr_commits) {
    -+			if (progress)
    -+				display_progress(progress, ++*progress_cnt);
    ++			display_progress(progress, ++*progress_cnt);
      			if ((*list)->object.oid.hash[0] != i)
      				break;
      			count++;
    @@ -68,8 +75,7 @@
      	int count;
     -	for (count = 0; count < nr_commits; count++, list++)
     +	for (count = 0; count < nr_commits; count++, list++) {
    -+		if (progress)
    -+			display_progress(progress, ++*progress_cnt);
    ++		display_progress(progress, ++*progress_cnt);
      		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
     +	}
      }
    @@ -87,15 +93,13 @@
      	struct commit **list = commits;
      	struct commit **last = commits + nr_commits;
     @@
    + 		struct commit_list *parent;
      		int edge_value;
      		uint32_t packedDate[2];
    ++		display_progress(progress, ++*progress_cnt);
      
    -+		if (progress)
    -+			display_progress(progress, ++*progress_cnt);
    -+
      		parse_commit(*list);
      		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
    - 
     @@
      
      static void write_graph_chunk_large_edges(struct hashfile *f,
    @@ -108,20 +112,18 @@
      	struct commit **list = commits;
      	struct commit **last = commits + nr_commits;
     @@
    + 						  commits,
      						  nr_commits,
      						  commit_to_sha1);
    ++			display_progress(progress, ++*progress_cnt);
      
    -+			if (progress)
    -+				display_progress(progress, ++*progress_cnt);
    -+
      			if (edge_value < 0)
      				edge_value = GRAPH_PARENT_MISSING;
    - 			else if (!parent->next)
     @@
      	int num_extra_edges;
      	struct commit_list *parent;
      	struct progress *progress = NULL;
    -+	uint64_t progress_cnt;
    ++	uint64_t progress_cnt = 0;
      
      	if (!commit_graph_compatible(the_repository))
      		return;
    @@ -135,8 +137,8 @@
     -	write_graph_chunk_large_edges(f, commits.list, commits.nr);
     +	if (report_progress)
     +		progress = start_delayed_progress(
    -+			_("Writing out commit graph chunks"),
    -+			progress_cnt = 0);
    ++			_("Writing out commit graph"),
    ++			0);
     +	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
     +				 &progress_cnt);
     +	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,
2:  d750f0dd16 ! 2:  6c71de9460 commit-graph write: show progress for object search
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -8,18 +8,17 @@
     
         Before we'd emit on e.g. linux.git with "commit-graph write":
     
    -        Finding commits for commit graph: 6418991, done.
    +        Finding commits for commit graph: 6365492, done.
             [...]
     
         And now:
     
    -        Finding commits for commit graph: 100% (6418991/6418991), done.
    +        Finding commits for commit graph: 100% (6365492/6365492), done.
             [...]
     
    -    Since the commit graph only includes those commits that are
    -    packed (via for_each_packed_object(...)) the
    -    approximate_object_count() returns the actual number of objects we're
    -    going to process.
    +    Since the commit graph only includes those commits that are packed
    +    (via for_each_packed_object(...)) the approximate_object_count()
    +    returns the actual number of objects we're going to process.
     
         Still, it is possible due to a race with "gc" or another process
         maintaining packs that the number of objects we're going to process is
    @@ -40,7 +39,7 @@
     @@
      	struct commit_list *parent;
      	struct progress *progress = NULL;
    - 	uint64_t progress_cnt;
    + 	uint64_t progress_cnt = 0;
     +	unsigned long approx_nr_objects;
      
      	if (!commit_graph_compatible(the_repository))
3:  a175ab49ff ! 3:  c665dbdacb commit-graph write: add more describing progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -10,18 +10,26 @@
         we support:
     
             $ git commit-graph write
    -        Finding commits for commit graph among packed objects: 100% (6418991/6418991), done.
    +        Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
             [...]
    +
    +        # Actually we don't emit this since this takes almost no time at
    +        # all. But if we did (s/_delayed//) we'd show:
             $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits
    -        Finding commits for commit graph from 584 ref tips: 100% (584/584), done.
    +        Finding commits for commit graph from 584 refs: 100% (584/584), done.
             [...]
    +
             $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
    -        Finding commits for commit graph in 4 packs: 6418991, done.
    +        Finding commits for commit graph in 3 packs: 6365492, done.
             [...]
     
    -    The middle on of those is going to be the output users will most
    -    commonly see, since it'll be emitted when they get the commit graph
    -    via gc.writeCommitGraph=true.
    +    The middle on of those is going to be the output users might see in
    +    practice, since it'll be emitted when they get the commit graph via
    +    gc.writeCommitGraph=true. But as noted above you need a really large
    +    number of refs for this message to show. It'll show up on a test
    +    repository I have with ~165k refs:
    +
    +        Finding commits for commit graph from 165203 refs: 100% (165203/165203), done.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ -30,7 +38,7 @@
      +++ b/commit-graph.c
     @@
      	struct progress *progress = NULL;
    - 	uint64_t progress_cnt;
    + 	uint64_t progress_cnt = 0;
      	unsigned long approx_nr_objects;
     +	struct strbuf progress_title = STRBUF_INIT;
      
    @@ -66,8 +74,8 @@
     -				commit_hex->nr);
     +		if (report_progress) {
     +			strbuf_addf(&progress_title,
    -+				    Q_("Finding commits for commit graph from %d ref tip",
    -+				       "Finding commits for commit graph from %d ref tips",
    ++				    Q_("Finding commits for commit graph from %d ref",
    ++				       "Finding commits for commit graph from %d refs",
     +				       commit_hex->nr),
     +				    commit_hex->nr);
     +			progress = start_delayed_progress(progress_title.buf,
4:  4e11c8b2fd = 4:  f70fc5045d commit-graph write: remove empty line for readability
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
5:  6fbba22fac ! 5:  2e943fa925 commit-graph write: add even more progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -4,24 +4,26 @@
     
         Add more progress output to sections of code that can collectively
         take 5-10 seconds on a large enough repository. On a test repository
    -    with 7141512 commits (see earlier patches for details) we'll now emit:
    +    with I have with ~7 million commits and ~50 million objects we'll now
    +    emit:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph among packed objects: 100% (50009986/50009986), done.
    -        Annotating commit graph: 21564240, done.
    -        Counting distinct commits in commit graph: 100% (7188080/7188080), done.
    -        Finding extra edges in commit graph: 100% (7188080/7188080), done.
    -        Computing commit graph generation numbers: 100% (7143635/7143635), done.
    -        Writing out commit graph chunks: 21431282, done.
    +        Finding commits for commit graph among packed objects: 100% (50026015/50026015), done.
    +        Annotating commit graph: 21567407, done.
    +        Counting distinct commits in commit graph: 100% (7189147/7189147), done.
    +        Finding extra edges in commit graph: 100% (7189147/7189147), done.
    +        Computing commit graph generation numbers: 100% (7144680/7144680), done.
    +        Writing out commit graph: 21434417, done.
     
    -    Whereas on a medium-sized repository such as linux.git we'll still
    +    Whereas on a medium-sized repository such as linux.git these new
    +    progress bars won't have time to kick in and as before and we'll still
         emit output like:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph among packed objects: 100% (6365328/6365328), done.
    -        Annotating commit graph: 2391621, done.
    -        Computing commit graph generation numbers: 100% (797207/797207), done.
    -        Writing out commit graph chunks: 2399867, done.
    +        Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    +        Annotating commit graph: 2391666, done.
    +        Computing commit graph generation numbers: 100% (797222/797222), done.
    +        Writing out commit graph: 2399912, done.
     
         The "Counting distinct commits in commit graph" phase will spend most
         of its time paused at "0/*" as we QSORT(...) the list. That's not
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 1/6] commit-graph write: rephrase confusing progress output
  2018-11-20 16:58                 ` SZEDER Gábor
  2018-11-20 19:50                   ` [PATCH v2 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
@ 2018-11-20 19:50                   ` Ævar Arnfjörð Bjarmason
  2018-11-20 19:50                   ` [PATCH v2 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
                                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.GH19800@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..e6d0d7722b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commits in commit graph"), 0);
+			_("Annotating commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 2/6] commit-graph write: add more progress output
  2018-11-20 16:58                 ` SZEDER Gábor
  2018-11-20 19:50                   ` [PATCH v2 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
  2018-11-20 19:50                   ` [PATCH v2 1/6] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
@ 2018-11-20 19:50                   ` " Ævar Arnfjörð Bjarmason
  2018-11-20 23:38                     ` SZEDER Gábor
  2018-11-20 19:50                   ` [PATCH v2 3/6] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
                                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add more progress output to the output already added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 6365492, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph: 2399912, done.

This "writing out" number is not meant to be meaningful to the user,
but just to show that we're doing work and the command isn't
hanging.

In the current implementation it's approximately 4x the number of
commits. As noted in on-list discussion[1] we could add the loops up
and show percentage progress here, but I don't think it's worth it. It
would make the implementation more complex and harder to maintain for
very little gain.

On a much larger in-house repository I have we'll show (note how we
also say "Annotating[...]"):

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 50026015, done.
    Annotating commit graph: 21567407, done.
    Computing commit graph generation numbers: 100% (7144680/7144680), done.
    Writing out commit graph: 21434417, done.

1. https://public-inbox.org/git/20181120165800.GB30222@szeder.dev/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e6d0d7722b..6f6409b292 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
-				     int nr_commits)
+				     int nr_commits,
+				     struct progress *progress,
+				     uint64_t *progress_cnt)
 {
 	int i, count = 0;
 	struct commit **list = commits;
@@ -445,6 +447,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 	 */
 	for (i = 0; i < 256; i++) {
 		while (count < nr_commits) {
+			display_progress(progress, ++*progress_cnt);
 			if ((*list)->object.oid.hash[0] != i)
 				break;
 			count++;
@@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	int count;
-	for (count = 0; count < nr_commits; count++, list++)
+	for (count = 0; count < nr_commits; count++, list++) {
+		display_progress(progress, ++*progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+	}
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		struct commit_list *parent;
 		int edge_value;
 		uint32_t packedDate[2];
+		display_progress(progress, ++*progress_cnt);
 
 		parse_commit(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
 					  struct commit **commits,
-					  int nr_commits)
+					  int nr_commits,
+					  struct progress *progress,
+					  uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -565,6 +577,7 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
 						  commits,
 						  nr_commits,
 						  commit_to_sha1);
+			display_progress(progress, ++*progress_cnt);
 
 			if (edge_value < 0)
 				edge_value = GRAPH_PARENT_MISSING;
@@ -764,6 +777,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_extra_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	uint64_t progress_cnt = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -937,10 +951,19 @@ void write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Writing out commit graph"),
+			0);
+	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
+				 &progress_cnt);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,
+			       progress, &progress_cnt);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr,
+			       progress, &progress_cnt);
+	write_graph_chunk_large_edges(f, commits.list, commits.nr, progress,
+				      &progress_cnt);
+	stop_progress(&progress);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 3/6] commit-graph write: show progress for object search
  2018-11-20 16:58                 ` SZEDER Gábor
                                     ` (2 preceding siblings ...)
  2018-11-20 19:50                   ` [PATCH v2 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
@ 2018-11-20 19:50                   ` Ævar Arnfjörð Bjarmason
  2018-11-20 19:50                   ` [PATCH v2 4/6] commit-graph write: add more describing progress output Ævar Arnfjörð Bjarmason
                                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

    Finding commits for commit graph: 6365492, done.
    [...]

And now:

    Finding commits for commit graph: 100% (6365492/6365492), done.
    [...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6f6409b292..7c1afa4704 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -778,12 +778,14 @@ void write_commit_graph(const char *obj_dir,
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 	uint64_t progress_cnt = 0;
+	unsigned long approx_nr_objects;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
 
 	oids.nr = 0;
-	oids.alloc = approximate_object_count() / 32;
+	approx_nr_objects = approximate_object_count();
+	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
 
@@ -863,8 +865,11 @@ void write_commit_graph(const char *obj_dir,
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+				_("Finding commits for commit graph"),
+				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		if (oids.progress_done < approx_nr_objects)
+			display_progress(oids.progress, approx_nr_objects);
 		stop_progress(&oids.progress);
 	}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 4/6] commit-graph write: add more describing progress output
  2018-11-20 16:58                 ` SZEDER Gábor
                                     ` (3 preceding siblings ...)
  2018-11-20 19:50                   ` [PATCH v2 3/6] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
@ 2018-11-20 19:50                   ` Ævar Arnfjörð Bjarmason
  2018-11-20 19:50                   ` [PATCH v2 5/6] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
                                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

    $ git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    [...]

    # Actually we don't emit this since this takes almost no time at
    # all. But if we did (s/_delayed//) we'd show:
    $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits
    Finding commits for commit graph from 584 refs: 100% (584/584), done.
    [...]

    $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
    Finding commits for commit graph in 3 packs: 6365492, done.
    [...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

    Finding commits for commit graph from 165203 refs: 100% (165203/165203), done.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7c1afa4704..fd1fd61750 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -779,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
 	struct progress *progress = NULL;
 	uint64_t progress_cnt = 0;
 	unsigned long approx_nr_objects;
+	struct strbuf progress_title = STRBUF_INIT;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -815,8 +816,12 @@ void write_commit_graph(const char *obj_dir,
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
 		if (report_progress) {
-			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph in %d pack",
+				       "Finding commits for commit graph in %d packs",
+				       pack_indexes->nr),
+				    pack_indexes->nr);
+			oids.progress = start_delayed_progress(progress_title.buf, 0);
 			oids.progress_done = 0;
 		}
 		for (i = 0; i < pack_indexes->nr; i++) {
@@ -833,14 +838,20 @@ void write_commit_graph(const char *obj_dir,
 			free(p);
 		}
 		stop_progress(&oids.progress);
+		strbuf_reset(&progress_title);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
-		if (report_progress)
-			progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
-				commit_hex->nr);
+		if (report_progress) {
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph from %d ref",
+				       "Finding commits for commit graph from %d refs",
+				       commit_hex->nr),
+				    commit_hex->nr);
+			progress = start_delayed_progress(progress_title.buf,
+							  commit_hex->nr);
+		}
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
@@ -860,12 +871,13 @@ void write_commit_graph(const char *obj_dir,
 			}
 		}
 		stop_progress(&progress);
+		strbuf_reset(&progress_title);
 	}
 
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
+				_("Finding commits for commit graph among packed objects"),
 				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
 		if (oids.progress_done < approx_nr_objects)
@@ -970,6 +982,8 @@ void write_commit_graph(const char *obj_dir,
 				      &progress_cnt);
 	stop_progress(&progress);
 
+	strbuf_release(&progress_title);
+
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 5/6] commit-graph write: remove empty line for readability
  2018-11-20 16:58                 ` SZEDER Gábor
                                     ` (4 preceding siblings ...)
  2018-11-20 19:50                   ` [PATCH v2 4/6] commit-graph write: add more describing progress output Ævar Arnfjörð Bjarmason
@ 2018-11-20 19:50                   ` Ævar Arnfjörð Bjarmason
  2018-11-20 19:50                   ` [PATCH v2 6/6] commit-graph write: add even more progress output Ævar Arnfjörð Bjarmason
  2018-11-21  1:23                   ` SZEDER Gábor
  7 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index fd1fd61750..0e98679bce 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -888,7 +888,6 @@ void write_commit_graph(const char *obj_dir,
 	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
-
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 6/6] commit-graph write: add even more progress output
  2018-11-20 16:58                 ` SZEDER Gábor
                                     ` (5 preceding siblings ...)
  2018-11-20 19:50                   ` [PATCH v2 5/6] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
@ 2018-11-20 19:50                   ` Ævar Arnfjörð Bjarmason
  2018-11-21  1:23                   ` SZEDER Gábor
  7 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-20 19:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add more progress output to sections of code that can collectively
take 5-10 seconds on a large enough repository. On a test repository
with I have with ~7 million commits and ~50 million objects we'll now
emit:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (50026015/50026015), done.
    Annotating commit graph: 21567407, done.
    Counting distinct commits in commit graph: 100% (7189147/7189147), done.
    Finding extra edges in commit graph: 100% (7189147/7189147), done.
    Computing commit graph generation numbers: 100% (7144680/7144680), done.
    Writing out commit graph: 21434417, done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    Annotating commit graph: 2391666, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph: 2399912, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 0e98679bce..1ad9000060 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -887,12 +887,19 @@ void write_commit_graph(const char *obj_dir,
 
 	close_reachable(&oids, report_progress);
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Counting distinct commits in commit graph"),
+			oids.nr);
+	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
 	QSORT(oids.list, oids.nr, commit_compare);
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
+		display_progress(progress, i + 1);
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
 			count_distinct++;
 	}
+	stop_progress(&progress);
 
 	if (count_distinct >= GRAPH_PARENT_MISSING)
 		die(_("the commit graph format cannot write %d commits"), count_distinct);
@@ -902,8 +909,13 @@ void write_commit_graph(const char *obj_dir,
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
 	num_extra_edges = 0;
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Finding extra edges in commit graph"),
+			oids.nr);
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
+		display_progress(progress, i + 1);
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
 			continue;
 
@@ -920,6 +932,7 @@ void write_commit_graph(const char *obj_dir,
 		commits.nr++;
 	}
 	num_chunks = num_extra_edges ? 4 : 3;
+	stop_progress(&progress);
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/6] commit-graph write: add more progress output
  2018-11-20 19:50                   ` [PATCH v2 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
@ 2018-11-20 23:38                     ` SZEDER Gábor
  0 siblings, 0 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-20 23:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King,
	Nguyễn Thái Ngọc Duy, Eric Sunshine,
	Derrick Stolee

On Tue, Nov 20, 2018 at 07:50:23PM +0000, Ævar Arnfjörð Bjarmason wrote:
> Add more progress output to the output already added in
> 7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
> 
> As noted in that commit most of the progress output isn't displayed on
> small repositories, but before this change we'd noticeably hang for
> 2-3 seconds at the end on medium sized repositories such as linux.git.
> 
> Now we'll instead show output like this, and have no human-observable
> point at which we're not producing progress output:
> 
>     $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
>     Finding commits for commit graph: 6365492, done.
>     Computing commit graph generation numbers: 100% (797222/797222), done.
>     Writing out commit graph: 2399912, done.
> 
> This "writing out" number is not meant to be meaningful to the user,
> but just to show that we're doing work and the command isn't
> hanging.
> 
> In the current implementation it's approximately 4x the number of
> commits.

"approximately" only, because the current implementation is buggy :)
If done right it's exactly 4x the number of commits.

> As noted in on-list discussion[1] we could add the loops up
> and show percentage progress here, but I don't think it's worth it. It
> would make the implementation more complex and harder to maintain for
> very little gain.

I think that if we can cheaply and accurately figure out the total,
then we should display it, so onlooking users can guesstimate how much
work is still left to be done.

> On a much larger in-house repository I have we'll show (note how we
> also say "Annotating[...]"):
> 
>     $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
>     Finding commits for commit graph: 50026015, done.
>     Annotating commit graph: 21567407, done.
>     Computing commit graph generation numbers: 100% (7144680/7144680), done.
>     Writing out commit graph: 21434417, done.
> 
> 1. https://public-inbox.org/git/20181120165800.GB30222@szeder.dev/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  commit-graph.c | 41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index e6d0d7722b..6f6409b292 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
>  
>  static void write_graph_chunk_fanout(struct hashfile *f,
>  				     struct commit **commits,
> -				     int nr_commits)
> +				     int nr_commits,
> +				     struct progress *progress,
> +				     uint64_t *progress_cnt)
>  {
>  	int i, count = 0;
>  	struct commit **list = commits;
> @@ -445,6 +447,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>  	 */
>  	for (i = 0; i < 256; i++) {
>  		while (count < nr_commits) {
> +			display_progress(progress, ++*progress_cnt);

I think this display_progress() should be places after the condition,
so no one has to waste brain cycles on figuring out, why it always
counts 255 more than the number of commits.

>  			if ((*list)->object.oid.hash[0] != i)
>  				break;
>  			count++;
> @@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>  }
>  
>  static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
> -				   struct commit **commits, int nr_commits)
> +				   struct commit **commits, int nr_commits,
> +				   struct progress *progress,
> +				   uint64_t *progress_cnt)
>  {
>  	struct commit **list = commits;
>  	int count;
> -	for (count = 0; count < nr_commits; count++, list++)
> +	for (count = 0; count < nr_commits; count++, list++) {
> +		display_progress(progress, ++*progress_cnt);
>  		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
> +	}
>  }
>  
>  static const unsigned char *commit_to_sha1(size_t index, void *table)
> @@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
>  }
>  
>  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> -				   struct commit **commits, int nr_commits)
> +				   struct commit **commits, int nr_commits,
> +				   struct progress *progress,
> +				   uint64_t *progress_cnt)
>  {
>  	struct commit **list = commits;
>  	struct commit **last = commits + nr_commits;
> @@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		struct commit_list *parent;
>  		int edge_value;
>  		uint32_t packedDate[2];
> +		display_progress(progress, ++*progress_cnt);
>  
>  		parse_commit(*list);
>  		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
> @@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  
>  static void write_graph_chunk_large_edges(struct hashfile *f,
>  					  struct commit **commits,
> -					  int nr_commits)
> +					  int nr_commits,
> +					  struct progress *progress,
> +					  uint64_t *progress_cnt)
>  {
>  	struct commit **list = commits;
>  	struct commit **last = commits + nr_commits;

> @@ -565,6 +577,7 @@ static void write_graph_chunk_large_edges(struct hashfile *f,

[Adding more before-context to this hunk here...]

>	while (list < last) {

This loop iterates over all commits ...

>		int num_parents = 0;
>		for (parent = (*list)->parents; num_parents < 3 && parent;
>		     parent = parent->next)
>			num_parents++;

... counts the parents of the current commit ...

>
>		if (num_parents <= 2) {
>			list++;
>			continue;

... and continues iterating unless it's an octopus merge.

>		}
>
>		/* Since num_parents > 2, this initializer is safe. */
>		for (parent = (*list)->parents->next; parent; parent = parent->next) {
>			int edge_value = sha1_pos(parent->item->object.oid.hash,
>  						  commits,
>  						  nr_commits,
>  						  commit_to_sha1);
> +			display_progress(progress, ++*progress_cnt);

So this display_progress() call is in the wrong place, because it will
only be invoked on octopus merges, which only rarely occur in
practice, thus it's entirely possible that it won't show any progress
at all while the outer while loop iterates over the whole history.

This display_progress() should be places somewhere before that 'if
(num_parents <= 2)' condition.  And then this one, too, will count the
number of commits.

>  
>  			if (edge_value < 0)
>  				edge_value = GRAPH_PARENT_MISSING;
> @@ -764,6 +777,7 @@ void write_commit_graph(const char *obj_dir,
>  	int num_extra_edges;
>  	struct commit_list *parent;
>  	struct progress *progress = NULL;
> +	uint64_t progress_cnt = 0;
>  
>  	if (!commit_graph_compatible(the_repository))
>  		return;
> @@ -937,10 +951,19 @@ void write_commit_graph(const char *obj_dir,
>  		hashwrite(f, chunk_write, 12);
>  	}
>  
> -	write_graph_chunk_fanout(f, commits.list, commits.nr);
> -	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
> -	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
> -	write_graph_chunk_large_edges(f, commits.list, commits.nr);
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Writing out commit graph"),
> +			0);
> +	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
> +				 &progress_cnt);
> +	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,
> +			       progress, &progress_cnt);
> +	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr,
> +			       progress, &progress_cnt);
> +	write_graph_chunk_large_edges(f, commits.list, commits.nr, progress,
> +				      &progress_cnt);
> +	stop_progress(&progress);
>  
>  	close_commit_graph(the_repository);
>  	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
> -- 
> 2.20.0.rc0.387.gc7a69e6b6c
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* 
  2018-11-20 16:58                 ` SZEDER Gábor
                                     ` (6 preceding siblings ...)
  2018-11-20 19:50                   ` [PATCH v2 6/6] commit-graph write: add even more progress output Ævar Arnfjörð Bjarmason
@ 2018-11-21  1:23                   ` SZEDER Gábor
  2018-11-21  1:25                     ` [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' SZEDER Gábor
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
  7 siblings, 2 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-21  1:23 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	Eric Sunshine

On Tue, Nov 20, 2018 at 05:58:00PM +0100, SZEDER Gábor wrote:
> I saw a
> bit of weirdness while at it, and want to look into it, but now I've
> got to go...

So here are two simple patches that address the "Huh?!" moments I had
while looking at the progress output during writing the commit graph
file.  The first is a small cleanup to avoid confusion, but see the
notes attaches, while the second is a bit of an optimization.

SZEDER Gábor (2):
  commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

 commit-graph.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

-- 
2.20.0.rc0.134.gf0022f8e60


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  2018-11-21  1:23                   ` SZEDER Gábor
@ 2018-11-21  1:25                     ` SZEDER Gábor
  2018-11-21  3:29                       ` Junio C Hamano
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
  1 sibling, 1 reply; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-21  1:25 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	Eric Sunshine

The commit graph file format describes an optional 'Large Edge List'
chunk, and the function writing out this chunk is called
write_graph_chunk_large_edges().  Then there are two functions in
'commit-graph.c', namely write_graph_chunk_data() and
write_commit_graph(), which have a local variable called
'num_extra_edges'.

It can be confusing on first sight whether large edges and extra edges
refer to the same thing or not, but they do, so let's rename those
variables to 'num_large_edges'.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

I rename these variables to 'num_large_edges', because the commit
graph file format speaks about the 'Large Edge List' chunk.

However, I do find that the term 'extra' makes much more sense and
fits the concept better (i.e. extra commit graph edges resulting from
the extra parents or octopus merges; after a s/extra/large/g the
previous phrase would make no sense), and notice that the term 'large'
doesn't come up in the file format itseld (the chunk's magic is {'E',
'D', 'G', 'E'}, there is no 'L' in there), but only in the
specification text and a couple of variable and function names in the
code.

Would it make sense to do the rename in the other direction?

 commit-graph.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..7b4e3a02cf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
-	uint32_t num_extra_edges = 0;
+	uint32_t num_large_edges = 0;
 
 	while (list < last) {
 		struct commit_list *parent;
@@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		if (!parent)
 			edge_value = GRAPH_PARENT_NONE;
 		else if (parent->next)
-			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_large_edges;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
 					      commits,
@@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
 			do {
-				num_extra_edges++;
+				num_large_edges++;
 				parent = parent->next;
 			} while (parent);
 		}
@@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	int num_extra_edges;
+	int num_large_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 
@@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
 	commits.alloc = count_distinct;
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
-	num_extra_edges = 0;
+	num_large_edges = 0;
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
@@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
 			num_parents++;
 
 		if (num_parents > 2)
-			num_extra_edges += num_parents - 1;
+			num_large_edges += num_parents - 1;
 
 		commits.nr++;
 	}
-	num_chunks = num_extra_edges ? 4 : 3;
+	num_chunks = num_large_edges ? 4 : 3;
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
@@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
 	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (num_extra_edges)
+	if (num_large_edges)
 		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
 	else
 		chunk_ids[3] = 0;
@@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
 	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
-- 
2.20.0.rc0.134.gf0022f8e60


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily
  2018-11-21  1:23                   ` SZEDER Gábor
  2018-11-21  1:25                     ` [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' SZEDER Gábor
@ 2018-11-21  1:26                     ` SZEDER Gábor
  2018-11-21 11:33                       ` Derrick Stolee
                                         ` (11 more replies)
  1 sibling, 12 replies; 88+ messages in thread
From: SZEDER Gábor @ 2018-11-21  1:26 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason, git,
	Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	Eric Sunshine

The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents.  Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Large Edge List' chunk written, only the three
mandatory chunks.

However, when it comes to writing chunk data, write_commit_graph()
unconditionally invokes write_graph_chunk_large_edges(), even when it
was decided earlier that that chunk won't be written.  Strictly
speaking there is no bug here, because write_graph_chunk_large_edges()
won't write anything because it won't find any commits with more than
two parents, but then it unnecessarily and in vain looks through all
commits once again in search for such commits.

Don't call write_graph_chunk_large_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 commit-graph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7b4e3a02cf..965eb23a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -940,7 +940,8 @@ void write_commit_graph(const char *obj_dir,
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
 	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+	if (num_large_edges)
+		write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.134.gf0022f8e60


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  2018-11-21  1:25                     ` [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' SZEDER Gábor
@ 2018-11-21  3:29                       ` Junio C Hamano
  2018-11-21 11:32                         ` Derrick Stolee
  0 siblings, 1 reply; 88+ messages in thread
From: Junio C Hamano @ 2018-11-21  3:29 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason, git,
	Jeff King, Nguyễn Thái Ngọc Duy, Eric Sunshine

SZEDER Gábor <szeder.dev@gmail.com> writes:

> I rename these variables to 'num_large_edges', because the commit
> graph file format speaks about the 'Large Edge List' chunk.
>
> However, I do find that the term 'extra' makes much more sense and
> fits the concept better (i.e. extra commit graph edges resulting from
> the extra parents or octopus merges; after a s/extra/large/g the
> previous phrase would make no sense), and notice that the term 'large'
> doesn't come up in the file format itseld (the chunk's magic is {'E',
> 'D', 'G', 'E'}, there is no 'L' in there), but only in the
> specification text and a couple of variable and function names in the
> code.
>
> Would it make sense to do the rename in the other direction?

So edges that are involved in octopus merges are counted with
num_extra_edges and written to the large edges table?

I tend to agree that "large edge" is a misnomer.  These edges that
point at third and subsequent parents are no larger than the edges
that point at the first or the second parents---they are the same
size.  What is larger than usual is the size of the list of edges
(i.e. the number of parents), because the commit has extra (compared
to the majority of commits) number of edges.  So from the point of
view, I agree with you that "extra" makes a lot more sense than
"large".

And the magic number being "EDGE" without "L" is probably a good
thing, as a graph whose commits are all without any extra edge does
not need the "EDGE" chunk, so presence of the chunk by itself is a
sign that extra things are involved.  Which means that there isn't
any need to update the magic number, if we wanted to get rid of
"large" and replace it with "extra".  The only thing needed to
update the documentation, variable names and in-code comment.

And while at it, GRAPH_OCTOPUS_EDGES_NEEDED may also want to be
renamed with s/OCTOPUS/EXTRA/;

>  commit-graph.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 40c855f185..7b4e3a02cf 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  {
>  	struct commit **list = commits;
>  	struct commit **last = commits + nr_commits;
> -	uint32_t num_extra_edges = 0;
> +	uint32_t num_large_edges = 0;
>  
>  	while (list < last) {
>  		struct commit_list *parent;
> @@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		if (!parent)
>  			edge_value = GRAPH_PARENT_NONE;
>  		else if (parent->next)
> -			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
> +			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_large_edges;
>  		else {
>  			edge_value = sha1_pos(parent->item->object.oid.hash,
>  					      commits,
> @@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  
>  		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
>  			do {
> -				num_extra_edges++;
> +				num_large_edges++;
>  				parent = parent->next;
>  			} while (parent);
>  		}
> @@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
>  	uint32_t chunk_ids[5];
>  	uint64_t chunk_offsets[5];
>  	int num_chunks;
> -	int num_extra_edges;
> +	int num_large_edges;
>  	struct commit_list *parent;
>  	struct progress *progress = NULL;
>  
> @@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
>  	commits.alloc = count_distinct;
>  	ALLOC_ARRAY(commits.list, commits.alloc);
>  
> -	num_extra_edges = 0;
> +	num_large_edges = 0;
>  	for (i = 0; i < oids.nr; i++) {
>  		int num_parents = 0;
>  		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
> @@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
>  			num_parents++;
>  
>  		if (num_parents > 2)
> -			num_extra_edges += num_parents - 1;
> +			num_large_edges += num_parents - 1;
>  
>  		commits.nr++;
>  	}
> -	num_chunks = num_extra_edges ? 4 : 3;
> +	num_chunks = num_large_edges ? 4 : 3;
>  
>  	if (commits.nr >= GRAPH_PARENT_MISSING)
>  		die(_("too many commits to write graph"));
> @@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
>  	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
>  	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
>  	chunk_ids[2] = GRAPH_CHUNKID_DATA;
> -	if (num_extra_edges)
> +	if (num_large_edges)
>  		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
>  	else
>  		chunk_ids[3] = 0;
> @@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
>  	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
>  	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
>  	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
> -	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
> +	chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
>  
>  	for (i = 0; i <= num_chunks; i++) {
>  		uint32_t chunk_write[3];

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  2018-11-21  3:29                       ` Junio C Hamano
@ 2018-11-21 11:32                         ` Derrick Stolee
  0 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-11-21 11:32 UTC (permalink / raw)
  To: Junio C Hamano, SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Nguyễn Thái Ngọc Duy, Eric Sunshine

On 11/20/2018 10:29 PM, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
>
>> I rename these variables to 'num_large_edges', because the commit
>> graph file format speaks about the 'Large Edge List' chunk.
>>
>> However, I do find that the term 'extra' makes much more sense
>>
>> Would it make sense to do the rename in the other direction?
> I tend to agree that "large edge" is a misnomer.

I agree with you both. "Extra" is better.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
@ 2018-11-21 11:33                       ` Derrick Stolee
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                         ` (10 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Derrick Stolee @ 2018-11-21 11:33 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Jeff King, Nguyễn Thái Ngọc Duy, Eric Sunshine

On 11/20/2018 8:26 PM, SZEDER Gábor wrote:
>   	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
> -	write_graph_chunk_large_edges(f, commits.list, commits.nr);
> +	if (num_large_edges)
> +		write_graph_chunk_large_edges(f, commits.list, commits.nr);

This is clearly correct, and the tests in t5318-commit-graph.sh would 
catch a dropped (or additional) large/extra edge chunk.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 00/10] commit-graph write: progress output improvements
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
  2018-11-21 11:33                       ` Derrick Stolee
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
                                           ` (11 more replies)
  2018-11-22 13:28                       ` [PATCH v3 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' Ævar Arnfjörð Bjarmason
                                         ` (9 subsequent siblings)
  11 siblings, 12 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

This incorporates SZEDER's recent two-part series, rebases mine on
top, and fixes a few things while I'm at it. Now there's no progress
output where we don't show a completion percentage.

SZEDER Gábor (2):
  commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

Ævar Arnfjörð Bjarmason (8):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add "Writing out" progress output
  commit-graph write: more descriptive "writing out" output
  commit-graph write: show progress for object search
  commit-graph write: add more descriptive progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add itermediate progress
  commit-graph write: emit a percentage for all progress

 commit-graph.c | 130 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 102 insertions(+), 28 deletions(-)

Range-diff:

By the way, is there any way to....

 [.. snipped lots of irrelevant commits...]
 -:  ---------- > 14:  07d06c50c0 commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
 -:  ---------- > 15:  904dda1e7a commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

Pass the equivalent of "git range-diff origin/master topic-2 topic-3"
to git-format-patch?

 1:  9f7fb459bd = 16:  1126c7e29d commit-graph write: rephrase confusing progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
 2:  093c63e99f ! 17:  2b52ad2284 commit-graph write: add more progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -1,9 +1,10 @@
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    -    commit-graph write: add more progress output
    +    commit-graph write: add "Writing out" progress output
     
    -    Add more progress output to the output already added in
    -    7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
    +    Add progress output to be shown when we're writing out the
    +    commit-graph, this adds to the output already added in 7b0f229222
    +    ("commit-graph write: add progress output", 2018-09-17).
     
         As noted in that commit most of the progress output isn't displayed on
         small repositories, but before this change we'd noticeably hang for
    @@ -13,30 +14,13 @@
         point at which we're not producing progress output:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph: 6365492, done.
    +        Finding commits for commit graph: 6365442, done.
             Computing commit graph generation numbers: 100% (797222/797222), done.
    -        Writing out commit graph: 2399912, done.
    +        Writing out commit graph: 100% (3986110/3986110), done.
     
    -    This "writing out" number is not meant to be meaningful to the user,
    -    but just to show that we're doing work and the command isn't
    -    hanging.
    -
    -    In the current implementation it's approximately 4x the number of
    -    commits. As noted in on-list discussion[1] we could add the loops up
    -    and show percentage progress here, but I don't think it's worth it. It
    -    would make the implementation more complex and harder to maintain for
    -    very little gain.
    -
    -    On a much larger in-house repository I have we'll show (note how we
    -    also say "Annotating[...]"):
    -
    -        $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph: 50026015, done.
    -        Annotating commit graph: 21567407, done.
    -        Computing commit graph generation numbers: 100% (7144680/7144680), done.
    -        Writing out commit graph: 21434417, done.
    -
    -    1. https://public-inbox.org/git/20181120165800.GB30222@szeder.dev/
    +    This "Writing out" number is 4x or 5x the number of commits, depending
    +    on the graph we're processing. A later change will make this explicit
    +    to the user.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ -55,13 +39,13 @@
      	int i, count = 0;
      	struct commit **list = commits;
     @@
    - 	 */
    - 	for (i = 0; i < 256; i++) {
      		while (count < nr_commits) {
    -+			display_progress(progress, ++*progress_cnt);
      			if ((*list)->object.oid.hash[0] != i)
      				break;
    ++			display_progress(progress, ++*progress_cnt);
      			count++;
    + 			list++;
    + 		}
     @@
      }
      
    @@ -112,15 +96,17 @@
      	struct commit **list = commits;
      	struct commit **last = commits + nr_commits;
     @@
    - 						  commits,
    - 						  nr_commits,
    - 						  commit_to_sha1);
    -+			display_progress(progress, ++*progress_cnt);
      
    - 			if (edge_value < 0)
    - 				edge_value = GRAPH_PARENT_MISSING;
    + 	while (list < last) {
    + 		int num_parents = 0;
    ++
    ++		display_progress(progress, ++*progress_cnt);
    ++
    + 		for (parent = (*list)->parents; num_parents < 3 && parent;
    + 		     parent = parent->next)
    + 			num_parents++;
     @@
    - 	int num_extra_edges;
    + 	int num_large_edges;
      	struct commit_list *parent;
      	struct progress *progress = NULL;
     +	uint64_t progress_cnt = 0;
    @@ -134,19 +120,25 @@
     -	write_graph_chunk_fanout(f, commits.list, commits.nr);
     -	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
     -	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
    --	write_graph_chunk_large_edges(f, commits.list, commits.nr);
    -+	if (report_progress)
    ++	if (report_progress) {
    ++		/*
    ++		 * Each of the write_graph_chunk_*() functions just
    ++		 * below loops over our N commits. This number must be
    ++		 * kept in sync with the number of passes we're doing.
    ++		 */
    ++		int graph_passes = 4;
    ++		if (num_large_edges)
    ++			graph_passes++;
     +		progress = start_delayed_progress(
     +			_("Writing out commit graph"),
    -+			0);
    -+	write_graph_chunk_fanout(f, commits.list, commits.nr, progress,
    -+				 &progress_cnt);
    -+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr,
    -+			       progress, &progress_cnt);
    -+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr,
    -+			       progress, &progress_cnt);
    -+	write_graph_chunk_large_edges(f, commits.list, commits.nr, progress,
    -+				      &progress_cnt);
    ++			graph_passes * commits.nr);
    ++	}
    ++	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
    ++	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt);
    ++	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt);
    + 	if (num_large_edges)
    +-		write_graph_chunk_large_edges(f, commits.list, commits.nr);
    ++		write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt);
     +	stop_progress(&progress);
      
      	close_commit_graph(the_repository);
 -:  ---------- > 18:  b1773677b1 commit-graph write: more descriptive "writing out" output
 3:  6c71de9460 ! 19:  3138b00a2c commit-graph write: show progress for object search
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -37,9 +37,9 @@
      --- a/commit-graph.c
      +++ b/commit-graph.c
     @@
    - 	struct commit_list *parent;
      	struct progress *progress = NULL;
      	uint64_t progress_cnt = 0;
    + 	struct strbuf progress_title = STRBUF_INIT;
     +	unsigned long approx_nr_objects;
      
      	if (!commit_graph_compatible(the_repository))
 4:  c665dbdacb ! 20:  f41e3b3eb3 commit-graph write: add more describing progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -1,6 +1,6 @@
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    -    commit-graph write: add more describing progress output
    +    commit-graph write: add more descriptive progress output
     
         Make the progress output shown when we're searching for commits to
         include in the graph more descriptive. This amends code I added in
    @@ -36,14 +36,6 @@
      diff --git a/commit-graph.c b/commit-graph.c
      --- a/commit-graph.c
      +++ b/commit-graph.c
    -@@
    - 	struct progress *progress = NULL;
    - 	uint64_t progress_cnt = 0;
    - 	unsigned long approx_nr_objects;
    -+	struct strbuf progress_title = STRBUF_INIT;
    - 
    - 	if (!commit_graph_compatible(the_repository))
    - 		return;
     @@
      		strbuf_addf(&packname, "%s/pack/", obj_dir);
      		dirlen = packname.len;
    @@ -99,12 +91,3 @@
      				approx_nr_objects);
      		for_each_packed_object(add_packed_commits, &oids, 0);
      		if (oids.progress_done < approx_nr_objects)
    -@@
    - 				      &progress_cnt);
    - 	stop_progress(&progress);
    - 
    -+	strbuf_release(&progress_title);
    -+
    - 	close_commit_graph(the_repository);
    - 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
    - 	commit_lock_file(&lk);
 5:  f70fc5045d = 21:  74037032d3 commit-graph write: remove empty line for readability
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
 6:  2e943fa925 ! 22:  502da68d14 commit-graph write: add even more progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -1,11 +1,13 @@
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    -    commit-graph write: add even more progress output
    +    commit-graph write: add itermediate progress
     
    -    Add more progress output to sections of code that can collectively
    -    take 5-10 seconds on a large enough repository. On a test repository
    -    with I have with ~7 million commits and ~50 million objects we'll now
    -    emit:
    +    Add progress output to sections of code between "Annotating[...]" and
    +    "Computing[...]generation numbers". This can collectively take 5-10
    +    seconds on a large enough repository.
    +
    +    On a test repository with I have with ~7 million commits and ~50
    +    million objects we'll now emit:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
             Finding commits for commit graph among packed objects: 100% (50026015/50026015), done.
    @@ -57,7 +59,7 @@
     @@
      	ALLOC_ARRAY(commits.list, commits.alloc);
      
    - 	num_extra_edges = 0;
    + 	num_large_edges = 0;
     +	if (report_progress)
     +		progress = start_delayed_progress(
     +			_("Finding extra edges in commit graph"),
    @@ -71,7 +73,7 @@
     @@
      		commits.nr++;
      	}
    - 	num_chunks = num_extra_edges ? 4 : 3;
    + 	num_chunks = num_large_edges ? 4 : 3;
     +	stop_progress(&progress);
      
      	if (commits.nr >= GRAPH_PARENT_MISSING)
 -:  ---------- > 23:  dfaf840983 commit-graph write: emit a percentage for all progress
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
  2018-11-21 11:33                       ` Derrick Stolee
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason
                                         ` (8 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: SZEDER Gábor <szeder.dev@gmail.com>

The commit graph file format describes an optional 'Large Edge List'
chunk, and the function writing out this chunk is called
write_graph_chunk_large_edges().  Then there are two functions in
'commit-graph.c', namely write_graph_chunk_data() and
write_commit_graph(), which have a local variable called
'num_extra_edges'.

It can be confusing on first sight whether large edges and extra edges
refer to the same thing or not, but they do, so let's rename those
variables to 'num_large_edges'.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..7b4e3a02cf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
-	uint32_t num_extra_edges = 0;
+	uint32_t num_large_edges = 0;
 
 	while (list < last) {
 		struct commit_list *parent;
@@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		if (!parent)
 			edge_value = GRAPH_PARENT_NONE;
 		else if (parent->next)
-			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_large_edges;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
 					      commits,
@@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
 			do {
-				num_extra_edges++;
+				num_large_edges++;
 				parent = parent->next;
 			} while (parent);
 		}
@@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	int num_extra_edges;
+	int num_large_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 
@@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
 	commits.alloc = count_distinct;
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
-	num_extra_edges = 0;
+	num_large_edges = 0;
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
@@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
 			num_parents++;
 
 		if (num_parents > 2)
-			num_extra_edges += num_parents - 1;
+			num_large_edges += num_parents - 1;
 
 		commits.nr++;
 	}
-	num_chunks = num_extra_edges ? 4 : 3;
+	num_chunks = num_large_edges ? 4 : 3;
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
@@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
 	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (num_extra_edges)
+	if (num_large_edges)
 		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
 	else
 		chunk_ids[3] = 0;
@@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
 	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (2 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 03/10] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
                                         ` (7 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: SZEDER Gábor <szeder.dev@gmail.com>

The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents.  Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Large Edge List' chunk written, only the three
mandatory chunks.

However, when it comes to writing chunk data, write_commit_graph()
unconditionally invokes write_graph_chunk_large_edges(), even when it
was decided earlier that that chunk won't be written.  Strictly
speaking there is no bug here, because write_graph_chunk_large_edges()
won't write anything because it won't find any commits with more than
two parents, but then it unnecessarily and in vain looks through all
commits once again in search for such commits.

Don't call write_graph_chunk_large_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7b4e3a02cf..965eb23a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -940,7 +940,8 @@ void write_commit_graph(const char *obj_dir,
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
 	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+	if (num_large_edges)
+		write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 03/10] commit-graph write: rephrase confusing progress output
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (3 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 04/10] commit-graph write: add "Writing out" " Ævar Arnfjörð Bjarmason
                                         ` (6 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.GH19800@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 965eb23a7b..d11370a2b3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commits in commit graph"), 0);
+			_("Annotating commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 04/10] commit-graph write: add "Writing out" progress output
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (4 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 03/10] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` " Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 05/10] commit-graph write: more descriptive "writing out" output Ævar Arnfjörð Bjarmason
                                         ` (5 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add progress output to be shown when we're writing out the
commit-graph, this adds to the output already added in 7b0f229222
("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 6365442, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph: 100% (3986110/3986110), done.

This "Writing out" number is 4x or 5x the number of commits, depending
on the graph we're processing. A later change will make this explicit
to the user.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 48 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d11370a2b3..e32a5cc1bc 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
-				     int nr_commits)
+				     int nr_commits,
+				     struct progress *progress,
+				     uint64_t *progress_cnt)
 {
 	int i, count = 0;
 	struct commit **list = commits;
@@ -447,6 +449,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 		while (count < nr_commits) {
 			if ((*list)->object.oid.hash[0] != i)
 				break;
+			display_progress(progress, ++*progress_cnt);
 			count++;
 			list++;
 		}
@@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	int count;
-	for (count = 0; count < nr_commits; count++, list++)
+	for (count = 0; count < nr_commits; count++, list++) {
+		display_progress(progress, ++*progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+	}
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		struct commit_list *parent;
 		int edge_value;
 		uint32_t packedDate[2];
+		display_progress(progress, ++*progress_cnt);
 
 		parse_commit(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
 					  struct commit **commits,
-					  int nr_commits)
+					  int nr_commits,
+					  struct progress *progress,
+					  uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -550,6 +562,9 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
 
 	while (list < last) {
 		int num_parents = 0;
+
+		display_progress(progress, ++*progress_cnt);
+
 		for (parent = (*list)->parents; num_parents < 3 && parent;
 		     parent = parent->next)
 			num_parents++;
@@ -764,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_large_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	uint64_t progress_cnt = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -937,11 +953,25 @@ void write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	if (report_progress) {
+		/*
+		 * Each of the write_graph_chunk_*() functions just
+		 * below loops over our N commits. This number must be
+		 * kept in sync with the number of passes we're doing.
+		 */
+		int graph_passes = 4;
+		if (num_large_edges)
+			graph_passes++;
+		progress = start_delayed_progress(
+			_("Writing out commit graph"),
+			graph_passes * commits.nr);
+	}
+	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt);
 	if (num_large_edges)
-		write_graph_chunk_large_edges(f, commits.list, commits.nr);
+		write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt);
+	stop_progress(&progress);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 05/10] commit-graph write: more descriptive "writing out" output
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (5 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 04/10] commit-graph write: add "Writing out" " Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 06/10] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
                                         ` (4 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Make the "Writing out" part of the progress output more
descriptive. Depending on the shape of the graph we either make 4 or 5
passes over it.

Let's present this information to the user in case they're wondering
what this number, which is much larger than their number of commits,
has to do with writing out the commit graph. Now e.g. on linux.git we
emit:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 6365442, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph in 5 passes: 100% (3986110/3986110), done.

A note on i18n: Why are we using the Q_() function and passing a
number & English text for a singular which'll never be used? Because
the plural rules of translated languages may not match those of
English, and to use the plural function we need to use this format.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index e32a5cc1bc..8e5970f0b9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -780,6 +780,7 @@ void write_commit_graph(const char *obj_dir,
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 	uint64_t progress_cnt = 0;
+	struct strbuf progress_title = STRBUF_INIT;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -962,8 +963,13 @@ void write_commit_graph(const char *obj_dir,
 		int graph_passes = 4;
 		if (num_large_edges)
 			graph_passes++;
+		strbuf_addf(&progress_title,
+			    Q_("Writing out commit graph in %d pass",
+			       "Writing out commit graph in %d passes",
+			       graph_passes),
+			    graph_passes);
 		progress = start_delayed_progress(
-			_("Writing out commit graph"),
+			progress_title.buf,
 			graph_passes * commits.nr);
 	}
 	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
@@ -973,6 +979,8 @@ void write_commit_graph(const char *obj_dir,
 		write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt);
 	stop_progress(&progress);
 
+	strbuf_release(&progress_title);
+
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 06/10] commit-graph write: show progress for object search
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (6 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 05/10] commit-graph write: more descriptive "writing out" output Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 07/10] commit-graph write: add more descriptive progress output Ævar Arnfjörð Bjarmason
                                         ` (3 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

    Finding commits for commit graph: 6365492, done.
    [...]

And now:

    Finding commits for commit graph: 100% (6365492/6365492), done.
    [...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8e5970f0b9..d6166beb19 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -781,12 +781,14 @@ void write_commit_graph(const char *obj_dir,
 	struct progress *progress = NULL;
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
+	unsigned long approx_nr_objects;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
 
 	oids.nr = 0;
-	oids.alloc = approximate_object_count() / 32;
+	approx_nr_objects = approximate_object_count();
+	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
 
@@ -866,8 +868,11 @@ void write_commit_graph(const char *obj_dir,
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+				_("Finding commits for commit graph"),
+				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		if (oids.progress_done < approx_nr_objects)
+			display_progress(oids.progress, approx_nr_objects);
 		stop_progress(&oids.progress);
 	}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 07/10] commit-graph write: add more descriptive progress output
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (7 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 06/10] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 08/10] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
                                         ` (2 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

    $ git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    [...]

    # Actually we don't emit this since this takes almost no time at
    # all. But if we did (s/_delayed//) we'd show:
    $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits
    Finding commits for commit graph from 584 refs: 100% (584/584), done.
    [...]

    $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
    Finding commits for commit graph in 3 packs: 6365492, done.
    [...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

    Finding commits for commit graph from 165203 refs: 100% (165203/165203), done.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d6166beb19..cb1aebeb79 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -818,8 +818,12 @@ void write_commit_graph(const char *obj_dir,
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
 		if (report_progress) {
-			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph in %d pack",
+				       "Finding commits for commit graph in %d packs",
+				       pack_indexes->nr),
+				    pack_indexes->nr);
+			oids.progress = start_delayed_progress(progress_title.buf, 0);
 			oids.progress_done = 0;
 		}
 		for (i = 0; i < pack_indexes->nr; i++) {
@@ -836,14 +840,20 @@ void write_commit_graph(const char *obj_dir,
 			free(p);
 		}
 		stop_progress(&oids.progress);
+		strbuf_reset(&progress_title);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
-		if (report_progress)
-			progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
-				commit_hex->nr);
+		if (report_progress) {
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph from %d ref",
+				       "Finding commits for commit graph from %d refs",
+				       commit_hex->nr),
+				    commit_hex->nr);
+			progress = start_delayed_progress(progress_title.buf,
+							  commit_hex->nr);
+		}
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
@@ -863,12 +873,13 @@ void write_commit_graph(const char *obj_dir,
 			}
 		}
 		stop_progress(&progress);
+		strbuf_reset(&progress_title);
 	}
 
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
+				_("Finding commits for commit graph among packed objects"),
 				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
 		if (oids.progress_done < approx_nr_objects)
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 08/10] commit-graph write: remove empty line for readability
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (8 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 07/10] commit-graph write: add more descriptive progress output Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 09/10] commit-graph write: add itermediate progress Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 10/10] commit-graph write: emit a percentage for all progress Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index cb1aebeb79..21751231e0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -890,7 +890,6 @@ void write_commit_graph(const char *obj_dir,
 	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
-
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 09/10] commit-graph write: add itermediate progress
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (9 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 08/10] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  2018-11-22 13:28                       ` [PATCH v3 10/10] commit-graph write: emit a percentage for all progress Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.

On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (50026015/50026015), done.
    Annotating commit graph: 21567407, done.
    Counting distinct commits in commit graph: 100% (7189147/7189147), done.
    Finding extra edges in commit graph: 100% (7189147/7189147), done.
    Computing commit graph generation numbers: 100% (7144680/7144680), done.
    Writing out commit graph: 21434417, done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    Annotating commit graph: 2391666, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph: 2399912, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 21751231e0..a6e6eeb56b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -889,12 +889,19 @@ void write_commit_graph(const char *obj_dir,
 
 	close_reachable(&oids, report_progress);
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Counting distinct commits in commit graph"),
+			oids.nr);
+	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
 	QSORT(oids.list, oids.nr, commit_compare);
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
+		display_progress(progress, i + 1);
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
 			count_distinct++;
 	}
+	stop_progress(&progress);
 
 	if (count_distinct >= GRAPH_PARENT_MISSING)
 		die(_("the commit graph format cannot write %d commits"), count_distinct);
@@ -904,8 +911,13 @@ void write_commit_graph(const char *obj_dir,
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
 	num_large_edges = 0;
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Finding extra edges in commit graph"),
+			oids.nr);
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
+		display_progress(progress, i + 1);
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
 			continue;
 
@@ -922,6 +934,7 @@ void write_commit_graph(const char *obj_dir,
 		commits.nr++;
 	}
 	num_chunks = num_large_edges ? 4 : 3;
+	stop_progress(&progress);
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 10/10] commit-graph write: emit a percentage for all progress
  2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
                                         ` (10 preceding siblings ...)
  2018-11-22 13:28                       ` [PATCH v3 09/10] commit-graph write: add itermediate progress Ævar Arnfjörð Bjarmason
@ 2018-11-22 13:28                       ` Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 13:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Change the "Annotating commit graph" progress output to show a
completion percentage. I added this in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) and evidently didn't notice
how easy it was to add a completion percentage.

Now for the very large test repository mentioned in previous commits
we'll emit (shows all progress output):

    Finding commits for commit graph among packed objects: 100% (48333911/48333911), done.
    Annotating commit graph: 100% (21435984/21435984), done.
    Counting distinct commits in commit graph: 100% (7145328/7145328), done.
    Finding extra edges in commit graph: 100% (7145328/7145328), done.
    Computing commit graph generation numbers: 100% (7145328/7145328), done.
    Writing out commit graph in 5 passes: 100% (35726640/35726640), done.

And for linux.git:

    Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
    Annotating commit graph: 100% (2391666/2391666), done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph in 5 passes: 100% (3986110/3986110), done.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index a6e6eeb56b..c893466042 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -660,10 +660,17 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 	struct commit *commit;
 	struct progress *progress = NULL;
 	int j = 0;
+	/*
+	 * We loop over the OIDs N times to close the graph
+	 * below. This number must be kept in sync with the number of
+	 * passes.
+	 */
+	const int oid_passes = 3;
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commit graph"), 0);
+			_("Annotating commit graph"),
+			oid_passes * oids->nr);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 00/10] commit-graph write: progress output improvements
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' Ævar Arnfjörð Bjarmason
                                           ` (10 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

The "Writing out" progress output was off-by-one because I'd screwed
up a merge conflict. Fix that, and update the various progress output.

On my test setup the "Annotating commit graph" progress sometimes
shows up on linux.git, sometimes not, it's right on that edge of
taking 1 second. So always show it in the commit message examples,
that's less confusing for the reader.

SZEDER Gábor (2):
  commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

Ævar Arnfjörð Bjarmason (8):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add "Writing out" progress output
  commit-graph write: more descriptive "writing out" output
  commit-graph write: show progress for object search
  commit-graph write: add more descriptive progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add itermediate progress
  commit-graph write: emit a percentage for all progress

 commit-graph.c | 130 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 102 insertions(+), 28 deletions(-)

Range-diff:
1:  2b52ad2284 ! 1:  9c17f56ed3 commit-graph write: add "Writing out" progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -15,10 +15,11 @@
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
             Finding commits for commit graph: 6365442, done.
    +        Annotating commit graph: 2391666, done.
             Computing commit graph generation numbers: 100% (797222/797222), done.
    -        Writing out commit graph: 100% (3986110/3986110), done.
    +        Writing out commit graph: 100% (3188888/3188888), done.
     
    -    This "Writing out" number is 4x or 5x the number of commits, depending
    +    This "Writing out" number is 3x or 4x the number of commits, depending
         on the graph we're processing. A later change will make this explicit
         to the user.
     
    @@ -126,7 +127,7 @@
     +		 * below loops over our N commits. This number must be
     +		 * kept in sync with the number of passes we're doing.
     +		 */
    -+		int graph_passes = 4;
    ++		int graph_passes = 3;
     +		if (num_large_edges)
     +			graph_passes++;
     +		progress = start_delayed_progress(
2:  b1773677b1 ! 2:  79b0a467d9 commit-graph write: more descriptive "writing out" output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -3,7 +3,7 @@
         commit-graph write: more descriptive "writing out" output
     
         Make the "Writing out" part of the progress output more
    -    descriptive. Depending on the shape of the graph we either make 4 or 5
    +    descriptive. Depending on the shape of the graph we either make 3 or 4
         passes over it.
     
         Let's present this information to the user in case they're wondering
    @@ -13,8 +13,9 @@
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
             Finding commits for commit graph: 6365442, done.
    +        Annotating commit graph: 2391666, done.
             Computing commit graph generation numbers: 100% (797222/797222), done.
    -        Writing out commit graph in 5 passes: 100% (3986110/3986110), done.
    +        Writing out commit graph in 4 passes: 100% (3188888/3188888), done.
     
         A note on i18n: Why are we using the Q_() function and passing a
         number & English text for a singular which'll never be used? Because
    @@ -35,7 +36,7 @@
      	if (!commit_graph_compatible(the_repository))
      		return;
     @@
    - 		int graph_passes = 4;
    + 		int graph_passes = 3;
      		if (num_large_edges)
      			graph_passes++;
     +		strbuf_addf(&progress_title,
3:  3138b00a2c ! 3:  b32be83b38 commit-graph write: show progress for object search
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -8,12 +8,12 @@
     
         Before we'd emit on e.g. linux.git with "commit-graph write":
     
    -        Finding commits for commit graph: 6365492, done.
    +        Finding commits for commit graph: 6365442, done.
             [...]
     
         And now:
     
    -        Finding commits for commit graph: 100% (6365492/6365492), done.
    +        Finding commits for commit graph: 100% (6365442/6365442), done.
             [...]
     
         Since the commit graph only includes those commits that are packed
4:  f41e3b3eb3 ! 4:  54276723c0 commit-graph write: add more descriptive progress output
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -10,7 +10,7 @@
         we support:
     
             $ git commit-graph write
    -        Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    +        Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
             [...]
     
             # Actually we don't emit this since this takes almost no time at
    @@ -20,7 +20,7 @@
             [...]
     
             $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
    -        Finding commits for commit graph in 3 packs: 6365492, done.
    +        Finding commits for commit graph in 2 packs: 6365442, done.
             [...]
     
         The middle on of those is going to be the output users might see in
5:  74037032d3 = 5:  0e847366e1 commit-graph write: remove empty line for readability
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
6:  502da68d14 ! 6:  c388aff73e commit-graph write: add itermediate progress
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -10,22 +10,22 @@
         million objects we'll now emit:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph among packed objects: 100% (50026015/50026015), done.
    -        Annotating commit graph: 21567407, done.
    -        Counting distinct commits in commit graph: 100% (7189147/7189147), done.
    -        Finding extra edges in commit graph: 100% (7189147/7189147), done.
    -        Computing commit graph generation numbers: 100% (7144680/7144680), done.
    -        Writing out commit graph: 21434417, done.
    +        Finding commits for commit graph among packed objects: 100% (48333911/48333911), done.
    +        Annotating commit graph: 21435984, done.
    +        Counting distinct commits in commit graph: 100% (7145328/7145328), done.
    +        Finding extra edges in commit graph: 100% (7145328/7145328), done.
    +        Computing commit graph generation numbers: 100% (7145328/7145328), done.
    +        Writing out commit graph in 4 passes: 100% (28581312/28581312), done.
     
         Whereas on a medium-sized repository such as linux.git these new
         progress bars won't have time to kick in and as before and we'll still
         emit output like:
     
             $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    -        Finding commits for commit graph among packed objects: 100% (6365492/6365492), done.
    +        Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
             Annotating commit graph: 2391666, done.
             Computing commit graph generation numbers: 100% (797222/797222), done.
    -        Writing out commit graph: 2399912, done.
    +        Writing out commit graph in 4 passes: 100% (3188888/3188888), done.
     
         The "Counting distinct commits in commit graph" phase will spend most
         of its time paused at "0/*" as we QSORT(...) the list. That's not
7:  dfaf840983 ! 7:  fd692499e0 commit-graph write: emit a percentage for all progress
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ -7,22 +7,13 @@
         write: add progress output", 2018-09-17) and evidently didn't notice
         how easy it was to add a completion percentage.
     
    -    Now for the very large test repository mentioned in previous commits
    -    we'll emit (shows all progress output):
    -
    -        Finding commits for commit graph among packed objects: 100% (48333911/48333911), done.
    -        Annotating commit graph: 100% (21435984/21435984), done.
    -        Counting distinct commits in commit graph: 100% (7145328/7145328), done.
    -        Finding extra edges in commit graph: 100% (7145328/7145328), done.
    -        Computing commit graph generation numbers: 100% (7145328/7145328), done.
    -        Writing out commit graph in 5 passes: 100% (35726640/35726640), done.
    -
    -    And for linux.git:
    +    Now for e.g. linux.git we'll emit:
     
    +        ~/g/git/git --exec-path=$HOME/g/git commit-graph write
             Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
             Annotating commit graph: 100% (2391666/2391666), done.
             Computing commit graph generation numbers: 100% (797222/797222), done.
    -        Writing out commit graph in 5 passes: 100% (3986110/3986110), done.
    +        Writing out commit graph in 4 passes: 100% (3188888/3188888), done.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason
                                           ` (9 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: SZEDER Gábor <szeder.dev@gmail.com>

The commit graph file format describes an optional 'Large Edge List'
chunk, and the function writing out this chunk is called
write_graph_chunk_large_edges().  Then there are two functions in
'commit-graph.c', namely write_graph_chunk_data() and
write_commit_graph(), which have a local variable called
'num_extra_edges'.

It can be confusing on first sight whether large edges and extra edges
refer to the same thing or not, but they do, so let's rename those
variables to 'num_large_edges'.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..7b4e3a02cf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
-	uint32_t num_extra_edges = 0;
+	uint32_t num_large_edges = 0;
 
 	while (list < last) {
 		struct commit_list *parent;
@@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		if (!parent)
 			edge_value = GRAPH_PARENT_NONE;
 		else if (parent->next)
-			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_large_edges;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
 					      commits,
@@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
 			do {
-				num_extra_edges++;
+				num_large_edges++;
 				parent = parent->next;
 			} while (parent);
 		}
@@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	int num_extra_edges;
+	int num_large_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 
@@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
 	commits.alloc = count_distinct;
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
-	num_extra_edges = 0;
+	num_large_edges = 0;
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
@@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
 			num_parents++;
 
 		if (num_parents > 2)
-			num_extra_edges += num_parents - 1;
+			num_large_edges += num_parents - 1;
 
 		commits.nr++;
 	}
-	num_chunks = num_extra_edges ? 4 : 3;
+	num_chunks = num_large_edges ? 4 : 3;
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
@@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
 	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (num_extra_edges)
+	if (num_large_edges)
 		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
 	else
 		chunk_ids[3] = 0;
@@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
 	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 03/10] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
                                           ` (8 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: SZEDER Gábor <szeder.dev@gmail.com>

The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents.  Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Large Edge List' chunk written, only the three
mandatory chunks.

However, when it comes to writing chunk data, write_commit_graph()
unconditionally invokes write_graph_chunk_large_edges(), even when it
was decided earlier that that chunk won't be written.  Strictly
speaking there is no bug here, because write_graph_chunk_large_edges()
won't write anything because it won't find any commits with more than
two parents, but then it unnecessarily and in vain looks through all
commits once again in search for such commits.

Don't call write_graph_chunk_large_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7b4e3a02cf..965eb23a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -940,7 +940,8 @@ void write_commit_graph(const char *obj_dir,
 	write_graph_chunk_fanout(f, commits.list, commits.nr);
 	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+	if (num_large_edges)
+		write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 03/10] commit-graph write: rephrase confusing progress output
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (2 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 04/10] commit-graph write: add "Writing out" " Ævar Arnfjörð Bjarmason
                                           ` (7 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.GH19800@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 965eb23a7b..d11370a2b3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commits in commit graph"), 0);
+			_("Annotating commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 04/10] commit-graph write: add "Writing out" progress output
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (3 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 03/10] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` " Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 05/10] commit-graph write: more descriptive "writing out" output Ævar Arnfjörð Bjarmason
                                           ` (6 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add progress output to be shown when we're writing out the
commit-graph, this adds to the output already added in 7b0f229222
("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 6365442, done.
    Annotating commit graph: 2391666, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph: 100% (3188888/3188888), done.

This "Writing out" number is 3x or 4x the number of commits, depending
on the graph we're processing. A later change will make this explicit
to the user.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 48 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d11370a2b3..dc57b8fedc 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
-				     int nr_commits)
+				     int nr_commits,
+				     struct progress *progress,
+				     uint64_t *progress_cnt)
 {
 	int i, count = 0;
 	struct commit **list = commits;
@@ -447,6 +449,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 		while (count < nr_commits) {
 			if ((*list)->object.oid.hash[0] != i)
 				break;
+			display_progress(progress, ++*progress_cnt);
 			count++;
 			list++;
 		}
@@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	int count;
-	for (count = 0; count < nr_commits; count++, list++)
+	for (count = 0; count < nr_commits; count++, list++) {
+		display_progress(progress, ++*progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+	}
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits)
+				   struct commit **commits, int nr_commits,
+				   struct progress *progress,
+				   uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		struct commit_list *parent;
 		int edge_value;
 		uint32_t packedDate[2];
+		display_progress(progress, ++*progress_cnt);
 
 		parse_commit(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
 					  struct commit **commits,
-					  int nr_commits)
+					  int nr_commits,
+					  struct progress *progress,
+					  uint64_t *progress_cnt)
 {
 	struct commit **list = commits;
 	struct commit **last = commits + nr_commits;
@@ -550,6 +562,9 @@ static void write_graph_chunk_large_edges(struct hashfile *f,
 
 	while (list < last) {
 		int num_parents = 0;
+
+		display_progress(progress, ++*progress_cnt);
+
 		for (parent = (*list)->parents; num_parents < 3 && parent;
 		     parent = parent->next)
 			num_parents++;
@@ -764,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
 	int num_large_edges;
 	struct commit_list *parent;
 	struct progress *progress = NULL;
+	uint64_t progress_cnt = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -937,11 +953,25 @@ void write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	write_graph_chunk_fanout(f, commits.list, commits.nr);
-	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	if (report_progress) {
+		/*
+		 * Each of the write_graph_chunk_*() functions just
+		 * below loops over our N commits. This number must be
+		 * kept in sync with the number of passes we're doing.
+		 */
+		int graph_passes = 3;
+		if (num_large_edges)
+			graph_passes++;
+		progress = start_delayed_progress(
+			_("Writing out commit graph"),
+			graph_passes * commits.nr);
+	}
+	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt);
 	if (num_large_edges)
-		write_graph_chunk_large_edges(f, commits.list, commits.nr);
+		write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt);
+	stop_progress(&progress);
 
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 05/10] commit-graph write: more descriptive "writing out" output
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (4 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 04/10] commit-graph write: add "Writing out" " Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 06/10] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
                                           ` (5 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Make the "Writing out" part of the progress output more
descriptive. Depending on the shape of the graph we either make 3 or 4
passes over it.

Let's present this information to the user in case they're wondering
what this number, which is much larger than their number of commits,
has to do with writing out the commit graph. Now e.g. on linux.git we
emit:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph: 6365442, done.
    Annotating commit graph: 2391666, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph in 4 passes: 100% (3188888/3188888), done.

A note on i18n: Why are we using the Q_() function and passing a
number & English text for a singular which'll never be used? Because
the plural rules of translated languages may not match those of
English, and to use the plural function we need to use this format.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index dc57b8fedc..3de65bc2e9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -780,6 +780,7 @@ void write_commit_graph(const char *obj_dir,
 	struct commit_list *parent;
 	struct progress *progress = NULL;
 	uint64_t progress_cnt = 0;
+	struct strbuf progress_title = STRBUF_INIT;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
@@ -962,8 +963,13 @@ void write_commit_graph(const char *obj_dir,
 		int graph_passes = 3;
 		if (num_large_edges)
 			graph_passes++;
+		strbuf_addf(&progress_title,
+			    Q_("Writing out commit graph in %d pass",
+			       "Writing out commit graph in %d passes",
+			       graph_passes),
+			    graph_passes);
 		progress = start_delayed_progress(
-			_("Writing out commit graph"),
+			progress_title.buf,
 			graph_passes * commits.nr);
 	}
 	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
@@ -973,6 +979,8 @@ void write_commit_graph(const char *obj_dir,
 		write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt);
 	stop_progress(&progress);
 
+	strbuf_release(&progress_title);
+
 	close_commit_graph(the_repository);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 06/10] commit-graph write: show progress for object search
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (5 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 05/10] commit-graph write: more descriptive "writing out" output Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 07/10] commit-graph write: add more descriptive progress output Ævar Arnfjörð Bjarmason
                                           ` (4 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

    Finding commits for commit graph: 6365442, done.
    [...]

And now:

    Finding commits for commit graph: 100% (6365442/6365442), done.
    [...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 3de65bc2e9..42d8365f0d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -781,12 +781,14 @@ void write_commit_graph(const char *obj_dir,
 	struct progress *progress = NULL;
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
+	unsigned long approx_nr_objects;
 
 	if (!commit_graph_compatible(the_repository))
 		return;
 
 	oids.nr = 0;
-	oids.alloc = approximate_object_count() / 32;
+	approx_nr_objects = approximate_object_count();
+	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
 
@@ -866,8 +868,11 @@ void write_commit_graph(const char *obj_dir,
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+				_("Finding commits for commit graph"),
+				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		if (oids.progress_done < approx_nr_objects)
+			display_progress(oids.progress, approx_nr_objects);
 		stop_progress(&oids.progress);
 	}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 07/10] commit-graph write: add more descriptive progress output
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (6 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 06/10] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 08/10] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
                                           ` (3 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

    $ git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
    [...]

    # Actually we don't emit this since this takes almost no time at
    # all. But if we did (s/_delayed//) we'd show:
    $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits
    Finding commits for commit graph from 584 refs: 100% (584/584), done.
    [...]

    $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
    Finding commits for commit graph in 2 packs: 6365442, done.
    [...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

    Finding commits for commit graph from 165203 refs: 100% (165203/165203), done.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 42d8365f0d..43b15785f6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -818,8 +818,12 @@ void write_commit_graph(const char *obj_dir,
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
 		if (report_progress) {
-			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"), 0);
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph in %d pack",
+				       "Finding commits for commit graph in %d packs",
+				       pack_indexes->nr),
+				    pack_indexes->nr);
+			oids.progress = start_delayed_progress(progress_title.buf, 0);
 			oids.progress_done = 0;
 		}
 		for (i = 0; i < pack_indexes->nr; i++) {
@@ -836,14 +840,20 @@ void write_commit_graph(const char *obj_dir,
 			free(p);
 		}
 		stop_progress(&oids.progress);
+		strbuf_reset(&progress_title);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
-		if (report_progress)
-			progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
-				commit_hex->nr);
+		if (report_progress) {
+			strbuf_addf(&progress_title,
+				    Q_("Finding commits for commit graph from %d ref",
+				       "Finding commits for commit graph from %d refs",
+				       commit_hex->nr),
+				    commit_hex->nr);
+			progress = start_delayed_progress(progress_title.buf,
+							  commit_hex->nr);
+		}
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
@@ -863,12 +873,13 @@ void write_commit_graph(const char *obj_dir,
 			}
 		}
 		stop_progress(&progress);
+		strbuf_reset(&progress_title);
 	}
 
 	if (!pack_indexes && !commit_hex) {
 		if (report_progress)
 			oids.progress = start_delayed_progress(
-				_("Finding commits for commit graph"),
+				_("Finding commits for commit graph among packed objects"),
 				approx_nr_objects);
 		for_each_packed_object(add_packed_commits, &oids, 0);
 		if (oids.progress_done < approx_nr_objects)
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 08/10] commit-graph write: remove empty line for readability
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (7 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 07/10] commit-graph write: add more descriptive progress output Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 09/10] commit-graph write: add itermediate progress Ævar Arnfjörð Bjarmason
                                           ` (2 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 43b15785f6..199155bd68 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -890,7 +890,6 @@ void write_commit_graph(const char *obj_dir,
 	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
-
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 09/10] commit-graph write: add itermediate progress
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (8 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 08/10] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 15:39                         ` [PATCH v4 10/10] commit-graph write: emit a percentage for all progress Ævar Arnfjörð Bjarmason
  2018-11-22 18:59                         ` [PATCH v3 00/10] commit-graph write: progress output improvements Eric Sunshine
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Add progress output to sections of code between "Annotating[...]" and
"Computing[...]generation numbers". This can collectively take 5-10
seconds on a large enough repository.

On a test repository with I have with ~7 million commits and ~50
million objects we'll now emit:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (48333911/48333911), done.
    Annotating commit graph: 21435984, done.
    Counting distinct commits in commit graph: 100% (7145328/7145328), done.
    Finding extra edges in commit graph: 100% (7145328/7145328), done.
    Computing commit graph generation numbers: 100% (7145328/7145328), done.
    Writing out commit graph in 4 passes: 100% (28581312/28581312), done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

    $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
    Annotating commit graph: 2391666, done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph in 4 passes: 100% (3188888/3188888), done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 199155bd68..80f201adf4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -889,12 +889,19 @@ void write_commit_graph(const char *obj_dir,
 
 	close_reachable(&oids, report_progress);
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Counting distinct commits in commit graph"),
+			oids.nr);
+	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
 	QSORT(oids.list, oids.nr, commit_compare);
 	count_distinct = 1;
 	for (i = 1; i < oids.nr; i++) {
+		display_progress(progress, i + 1);
 		if (!oideq(&oids.list[i - 1], &oids.list[i]))
 			count_distinct++;
 	}
+	stop_progress(&progress);
 
 	if (count_distinct >= GRAPH_PARENT_MISSING)
 		die(_("the commit graph format cannot write %d commits"), count_distinct);
@@ -904,8 +911,13 @@ void write_commit_graph(const char *obj_dir,
 	ALLOC_ARRAY(commits.list, commits.alloc);
 
 	num_large_edges = 0;
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Finding extra edges in commit graph"),
+			oids.nr);
 	for (i = 0; i < oids.nr; i++) {
 		int num_parents = 0;
+		display_progress(progress, i + 1);
 		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
 			continue;
 
@@ -922,6 +934,7 @@ void write_commit_graph(const char *obj_dir,
 		commits.nr++;
 	}
 	num_chunks = num_large_edges ? 4 : 3;
+	stop_progress(&progress);
 
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v4 10/10] commit-graph write: emit a percentage for all progress
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (9 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 09/10] commit-graph write: add itermediate progress Ævar Arnfjörð Bjarmason
@ 2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
  2018-11-22 18:59                         ` [PATCH v3 00/10] commit-graph write: progress output improvements Eric Sunshine
  11 siblings, 0 replies; 88+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-22 15:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Change the "Annotating commit graph" progress output to show a
completion percentage. I added this in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) and evidently didn't notice
how easy it was to add a completion percentage.

Now for e.g. linux.git we'll emit:

    ~/g/git/git --exec-path=$HOME/g/git commit-graph write
    Finding commits for commit graph among packed objects: 100% (6365442/6365442), done.
    Annotating commit graph: 100% (2391666/2391666), done.
    Computing commit graph generation numbers: 100% (797222/797222), done.
    Writing out commit graph in 4 passes: 100% (3188888/3188888), done.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 80f201adf4..6c6edc679b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -660,10 +660,17 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
 	struct commit *commit;
 	struct progress *progress = NULL;
 	int j = 0;
+	/*
+	 * We loop over the OIDs N times to close the graph
+	 * below. This number must be kept in sync with the number of
+	 * passes.
+	 */
+	const int oid_passes = 3;
 
 	if (report_progress)
 		progress = start_delayed_progress(
-			_("Annotating commit graph"), 0);
+			_("Annotating commit graph"),
+			oid_passes * oids->nr);
 	for (i = 0; i < oids->nr; i++) {
 		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 00/10] commit-graph write: progress output improvements
  2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
                                           ` (10 preceding siblings ...)
  2018-11-22 15:39                         ` [PATCH v4 10/10] commit-graph write: emit a percentage for all progress Ævar Arnfjörð Bjarmason
@ 2018-11-22 18:59                         ` Eric Sunshine
  11 siblings, 0 replies; 88+ messages in thread
From: Eric Sunshine @ 2018-11-22 18:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Jeff King,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor,
	Derrick Stolee

On Thu, Nov 22, 2018 at 8:28 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Range-diff:
> By the way, is there any way to....
> Pass the equivalent of "git range-diff origin/master topic-2 topic-3"
> to git-format-patch?

git-range-diff documentations says that the three-argument form:

    git range-diff <base> <rev1> <rev2>

is equivalent to passing two ranges:

    git range-diff <base>..<rev1> <base>..<rev2>

git-format-patch synopsis shows:

    git format-patch --range-diff=<previous> <rev-range>

where <rev-range> is the range of commits to format, and <previous>
can be a range specifying the previous version, so:

    git format-patch --range-diff=<base>..<rev1> <base>..<rev2>

should do what you ask.

However, since the two versions in your example both derive from
origin/master, you should be able to get by with the simpler:

    git format-patch --range-diff=<rev1> <base>..<rev2>

which, if you were running git-range-diff manually, would be the equivalent of:

    git range-diff <rev1>...<rev2>

for which the range-diff machinery figures out the common base
(origin/master) automatically.

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, back to index

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
2018-09-04 21:16   ` Eric Sunshine
2018-09-04 22:07   ` Junio C Hamano
2018-09-05 11:58     ` Derrick Stolee
2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
2018-09-05 21:46       ` Junio C Hamano
2018-09-05 22:12         ` Derrick Stolee
2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
2018-09-07 17:15           ` Jeff King
2018-09-07 17:25             ` Derrick Stolee
2018-09-05 12:06   ` Derrick Stolee
2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
2018-09-07 13:12     ` Derrick Stolee
2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
2018-09-04 22:10   ` Junio C Hamano
2018-09-05 12:07 ` [PATCH 0/2] commit-graph: " Derrick Stolee
2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2018-09-11 20:26   ` Junio C Hamano
2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
2018-09-21 20:01   ` Derrick Stolee
2018-09-21 21:43     ` Junio C Hamano
2018-09-21 21:57       ` Junio C Hamano
2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
2018-09-16  6:55   ` Duy Nguyen
2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
2018-10-10 20:37       ` SZEDER Gábor
2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
2018-10-10 22:19           ` SZEDER Gábor
2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
2018-10-15 16:05                 ` SZEDER Gábor
2018-10-12  6:09         ` Junio C Hamano
2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
2018-10-12 15:12             ` Derrick Stolee
2018-10-15 16:54       ` SZEDER Gábor
2018-11-19 16:02         ` SZEDER Gábor
2018-11-19 20:23           ` [PATCH] commit-graph: split up close_reachable() " Ævar Arnfjörð Bjarmason
2018-11-19 20:38             ` Derrick Stolee
2018-11-19 22:57             ` SZEDER Gábor
2018-11-20 15:04               ` [PATCH 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
2018-11-20 15:04               ` [PATCH 1/6] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
2018-11-20 15:04               ` [PATCH 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
2018-11-20 16:58                 ` SZEDER Gábor
2018-11-20 19:50                   ` [PATCH v2 0/6] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
2018-11-20 19:50                   ` [PATCH v2 1/6] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
2018-11-20 19:50                   ` [PATCH v2 2/6] commit-graph write: add more " Ævar Arnfjörð Bjarmason
2018-11-20 23:38                     ` SZEDER Gábor
2018-11-20 19:50                   ` [PATCH v2 3/6] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
2018-11-20 19:50                   ` [PATCH v2 4/6] commit-graph write: add more describing progress output Ævar Arnfjörð Bjarmason
2018-11-20 19:50                   ` [PATCH v2 5/6] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
2018-11-20 19:50                   ` [PATCH v2 6/6] commit-graph write: add even more progress output Ævar Arnfjörð Bjarmason
2018-11-21  1:23                   ` SZEDER Gábor
2018-11-21  1:25                     ` [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' SZEDER Gábor
2018-11-21  3:29                       ` Junio C Hamano
2018-11-21 11:32                         ` Derrick Stolee
2018-11-21  1:26                     ` [PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily SZEDER Gábor
2018-11-21 11:33                       ` Derrick Stolee
2018-11-22 13:28                       ` [PATCH v3 00/10] commit-graph write: progress output improvements Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 03/10] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 04/10] commit-graph write: add "Writing out" " Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 05/10] commit-graph write: more descriptive "writing out" output Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 06/10] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 07/10] commit-graph write: add more descriptive progress output Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 08/10] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 09/10] commit-graph write: add itermediate progress Ævar Arnfjörð Bjarmason
2018-11-22 15:39                         ` [PATCH v4 10/10] commit-graph write: emit a percentage for all progress Ævar Arnfjörð Bjarmason
2018-11-22 18:59                         ` [PATCH v3 00/10] commit-graph write: progress output improvements Eric Sunshine
2018-11-22 13:28                       ` [PATCH v3 01/10] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 02/10] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 03/10] commit-graph write: rephrase confusing progress output Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 04/10] commit-graph write: add "Writing out" " Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 05/10] commit-graph write: more descriptive "writing out" output Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 06/10] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 07/10] commit-graph write: add more descriptive progress output Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 08/10] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 09/10] commit-graph write: add itermediate progress Ævar Arnfjörð Bjarmason
2018-11-22 13:28                       ` [PATCH v3 10/10] commit-graph write: emit a percentage for all progress Ævar Arnfjörð Bjarmason
2018-11-20 15:04               ` [PATCH 3/6] commit-graph write: show progress for object search Ævar Arnfjörð Bjarmason
2018-11-20 15:04               ` [PATCH 4/6] commit-graph write: add more describing progress output Ævar Arnfjörð Bjarmason
2018-11-20 15:04               ` [PATCH 5/6] commit-graph write: remove empty line for readability Ævar Arnfjörð Bjarmason
2018-11-20 15:04               ` [PATCH 6/6] commit-graph write: add even more progress output Ævar Arnfjörð Bjarmason
2018-09-17 15:33     ` [PATCH v3 2/2] commit-graph verify: add " Ævar Arnfjörð Bjarmason

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox