git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH 0/2] commit-graph: add progress output
@ 2018-09-04 20:27 Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-04 20:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Ævar Arnfjörð Bjarmason

This series adds progress output to the commit-graph command, so that
when it's called by "git gc" or "git fsck" we can see what's going on
with it.

Ævar Arnfjörð Bjarmason (2):
  commit-graph write: add progress output
  commit-graph verify: add progress output

 commit-graph.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
@ 2018-09-04 20:27 ` " Ævar Arnfjörð Bjarmason
  2018-09-04 21:16   ` Eric Sunshine
                     ` (3 more replies)
  2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-04 20:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Ævar Arnfjörð Bjarmason

Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository, which is a test case for
larger monorepos.

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Total 2821 (delta 1670), reused 2821 (delta 1670)
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either detect by detecting
that we're only processing one pack, or by first looping over the
packs to discover how many commits they have. I don't see the point in
doing that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

1. https://github.com/avar/2015-04-03-1M-git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..74889dc90a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@ struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -591,8 +597,13 @@ static void close_reachable(struct packed_oid_list *oids)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	progress = start_delayed_progress(
+		_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +615,7 @@ static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +623,25 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
 static void compute_generation_numbers(struct packed_commit_list* commits)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	progress = start_progress(
+		_("Computing commit graph generation numbers"), commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +673,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	display_progress(progress, i);
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -692,9 +712,12 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +744,9 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		oids.progress = start_delayed_progress(
+			_("Finding commits for commit graph"), 0);
+		oids.progress_done = 0;
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +759,19 @@ void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		progress = start_delayed_progress(
+			_("Finding commits for commit graph"), commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,10 +784,16 @@ void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		display_progress(progress, i);
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		oids.progress = start_delayed_progress(
+			_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
 	close_reachable(&oids);
 
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 2/2] commit-graph verify: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-04 20:27 ` " Ævar Arnfjörð Bjarmason
  2018-09-04 22:10   ` Junio C Hamano
  2018-09-05 12:07 ` [PATCH 0/2] commit-graph: " Derrick Stolee
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-04 20:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Ævar Arnfjörð Bjarmason

For the reasons explained in the "commit-graph write: add progress
output" commit leading up to this one, emit progress on "commit-graph
verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
"git fsck" has called this command if core.commitGraph=true, but
there's been no progress output to indicate that anything was
different. Now there is (on my tiny dotfiles.git repository):

    $ git -c core.commitGraph=true -C ~/ fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2821/2821), done.
    dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
    Verifying commits in commit graph: 100% (867/867), done.

And on a larger repository, such as the 2015-04-03-1M-git.git test
repository:

    $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
    Verifying commits in commit graph: 100% (1000447/1000447), done.
    real    0m7.813s
    [...]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 74889dc90a..1a02fe019a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -914,6 +914,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
+	struct progress *progress = NULL;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -981,11 +982,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
+	progress = start_progress("Verifying commits in commit graph",
+				  g->num_commits);
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
 
+		display_progress(progress, i);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
@@ -1062,6 +1066,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     graph_commit->date,
 				     odb_commit->date);
 	}
+	display_progress(progress, i);
+	stop_progress(&progress);
 
 	return verify_commit_graph_error;
 }
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-04 21:16   ` Eric Sunshine
  2018-09-04 22:07   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: Eric Sunshine @ 2018-09-04 21:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Derrick Stolee

On Tue, Sep 4, 2018 at 4:27 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting

s/detect//

> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in
> doing that work. So instead we get (on 2015-04-03-1M-git.git):
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-04 21:16   ` Eric Sunshine
@ 2018-09-04 22:07   ` Junio C Hamano
  2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:06   ` Derrick Stolee
  2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-04 22:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Before this change the "commit-graph write" command didn't report any

Please describe the pre-patch state in present tense without "Before
this change".

> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.
>
> Furthermore, since the gc.writeCommitGraph setting was added in
> d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
> there was no indication at all from a "git gc" run that anything was
> different. This why one of the progress bars being added here uses

"This is why", I guess.

> start_progress() instead of start_delayed_progress(), so that it's
> guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
> repository:
>
>     $ git -c gc.writeCommitGraph=true gc
>     Enumerating objects: 2821, done.
>     [...]
>     Total 2821 (delta 1670), reused 2821 (delta 1670)
>     Computing commit graph generation numbers: 100% (867/867), done.
>
> On larger repositories, such as linux.git the delayed progress bar(s)

"such as linux.git, the delayed ..."

> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting
> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in

I do not know if there is no point, but if we were to do it, I think
slurping the list of packs and computing the number of objects is
not all that bad.

>  static void compute_generation_numbers(struct packed_commit_list* commits)
>  {
>  	int i;
>  	struct commit_list *list = NULL;
> +	struct progress *progress = NULL;
>  
> +	progress = start_progress(
> +		_("Computing commit graph generation numbers"), commits->nr);
>  	for (i = 0; i < commits->nr; i++) {
> +		display_progress(progress, i);
>  		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>  		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  			continue;

I am wondering if the progress call should be moved after this
conditional continue; would we want to count the entry whose
generation is already known here?  Of course, as we give commits->nr
as the 100% ceiling, we cannot avoid doing so, but it somehow smells
wrong.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] commit-graph verify: add progress output
  2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
@ 2018-09-04 22:10   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-04 22:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> For the reasons explained in the "commit-graph write: add progress
> output" commit leading up to this one, emit progress on "commit-graph
> verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
> "git fsck" has called this command if core.commitGraph=true, but
> there's been no progress output to indicate that anything was
> different. Now there is (on my tiny dotfiles.git repository):
>
>     $ git -c core.commitGraph=true -C ~/ fsck
>     Checking object directories: 100% (256/256), done.
>     Checking objects: 100% (2821/2821), done.
>     dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
>     Verifying commits in commit graph: 100% (867/867), done.
>
> And on a larger repository, such as the 2015-04-03-1M-git.git test
> repository:
>
>     $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
>     Verifying commits in commit graph: 100% (1000447/1000447), done.
>     real    0m7.813s
>     [...]
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  commit-graph.c | 6 ++++++
>  1 file changed, 6 insertions(+)

Yup.  The verification side knows the total number of things, so it
is much easier to give the percentage progress with a very simple
addition like this, which is very nice.



>
> diff --git a/commit-graph.c b/commit-graph.c
> index 74889dc90a..1a02fe019a 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -914,6 +914,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  	int generation_zero = 0;
>  	struct hashfile *f;
>  	int devnull;
> +	struct progress *progress = NULL;
>  
>  	if (!g) {
>  		graph_report("no commit-graph file loaded");
> @@ -981,11 +982,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>  		return verify_commit_graph_error;
>  
> +	progress = start_progress("Verifying commits in commit graph",
> +				  g->num_commits);
>  	for (i = 0; i < g->num_commits; i++) {
>  		struct commit *graph_commit, *odb_commit;
>  		struct commit_list *graph_parents, *odb_parents;
>  		uint32_t max_generation = 0;
>  
> +		display_progress(progress, i);
>  		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
>  
>  		graph_commit = lookup_commit(r, &cur_oid);
> @@ -1062,6 +1066,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  				     graph_commit->date,
>  				     odb_commit->date);
>  	}
> +	display_progress(progress, i);
> +	stop_progress(&progress);
>  
>  	return verify_commit_graph_error;
>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 22:07   ` Junio C Hamano
@ 2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
                         ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-09-05 11:58 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason; +Cc: git

On 9/4/2018 6:07 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> With --stdin-packs we don't show any estimation of how much is left to
>> do. This is because we might be processing more than one pack. We
>> could be less lazy here and show progress, either detect by detecting
>> that we're only processing one pack, or by first looping over the
>> packs to discover how many commits they have. I don't see the point in
> I do not know if there is no point, but if we were to do it, I think
> slurping the list of packs and computing the number of objects is
> not all that bad.

If you want to do that, I have nothing against it. However, I don't 
expect users to use that option directly. That option is used by VFS for 
Git to compute the commit-graph in the background after receiving a pack 
of commits and trees, but not by 'git gc' which I expect is how most 
users will compute commit-graphs.

>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>   {
>>   	int i;
>>   	struct commit_list *list = NULL;
>> +	struct progress *progress = NULL;
>>   
>> +	progress = start_progress(
>> +		_("Computing commit graph generation numbers"), commits->nr);
>>   	for (i = 0; i < commits->nr; i++) {
>> +		display_progress(progress, i);
>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>   			continue;
> I am wondering if the progress call should be moved after this
> conditional continue; would we want to count the entry whose
> generation is already known here?  Of course, as we give commits->nr
> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
> wrong.

If we wanted to be completely right, we would count the commits in the 
list that do not have a generation number and report that as the 100% 
ceiling.

Something like the diff below would work. I tested it in Linux by first 
deleting my commit-graph and running the following:

stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git commit-graph 
write --stdin-commits
Annotating commits in commit graph: 1180333, done.
Computing commit graph generation numbers: 100% (590166/590166), done.
stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
Annotating commits in commit graph: 1564087, done.
Computing commit graph generation numbers: 100% (191590/191590), done.

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Wed, 5 Sep 2018 11:55:42 +0000
Subject: [PATCH] fixup! commit-graph write: add progress output

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
  commit-graph.c | 15 +++++++++++----
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 1a02fe019a..b933bc9f00 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -634,14 +634,20 @@ static void close_reachable(struct packed_oid_list 
*oids)

  static void compute_generation_numbers(struct packed_commit_list* commits)
  {
-       int i;
+       int i, count_uncomputed = 0;
         struct commit_list *list = NULL;
         struct progress *progress = NULL;

+       for (i = 0; i < commits->nr; i++)
+               if (commits->list[i]->generation == 
GENERATION_NUMBER_INFINITY ||
+                   commits->list[i]->generation == GENERATION_NUMBER_ZERO)
+                       count_uncomputed++;
+
         progress = start_progress(
-               _("Computing commit graph generation numbers"), 
commits->nr);
+               _("Computing commit graph generation numbers"), 
count_uncomputed);
+       count_uncomputed = 0;
+
         for (i = 0; i < commits->nr; i++) {
-               display_progress(progress, i);
                 if (commits->list[i]->generation != 
GENERATION_NUMBER_INFINITY &&
                     commits->list[i]->generation != GENERATION_NUMBER_ZERO)
                         continue;
@@ -670,10 +676,11 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)

                                 if (current->generation > 
GENERATION_NUMBER_MAX)
                                         current->generation = 
GENERATION_NUMBER_MAX;
+
+                               display_progress(progress, 
++count_uncomputed);
                         }
                 }
         }
-       display_progress(progress, i);
         stop_progress(&progress);
  }

--
2.19.0.rc2


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-04 21:16   ` Eric Sunshine
  2018-09-04 22:07   ` Junio C Hamano
@ 2018-09-05 12:06   ` Derrick Stolee
  2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-09-05 12:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

On 9/4/2018 4:27 PM, Ævar Arnfjörð Bjarmason wrote:
> @@ -591,8 +597,13 @@ static void close_reachable(struct packed_oid_list *oids)
>   {
>   	int i;
>   	struct commit *commit;
> +	struct progress *progress = NULL;
> +	int j = 0;

The change below over-counts the number of commits we are processing (by 
at least double, possibly triple).


> +	progress = start_delayed_progress(
> +		_("Annotating commits in commit graph"), 0);
>   	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>   		commit = lookup_commit(the_repository, &oids->list[i]);
>   		if (commit)
>   			commit->object.flags |= UNINTERESTING;
This count is the number of oids given to the method. For 'git 
commit-graph write --reachable', this will be the number of refs.
> @@ -604,6 +615,7 @@ static void close_reachable(struct packed_oid_list *oids)
>   	 * closure.
>   	 */
>   	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>   		commit = lookup_commit(the_repository, &oids->list[i]);
>   
>   		if (commit && !parse_commit(commit))

This is the important count, since we will be parsing commits and adding 
their parents to the list. The bulk of the work happens here.

> @@ -611,19 +623,25 @@ static void close_reachable(struct packed_oid_list *oids)
>   	}
>   
>   	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>   		commit = lookup_commit(the_repository, &oids->list[i]);
This iterates through the commits a second time and removes the 
UNINTERESTING flag.
>   
>   		if (commit)
>   			commit->object.flags &= ~UNINTERESTING;
>   	}
> +	stop_progress(&progress);
>   }

I think it is good to have the progress start before the first loop and 
end after the third loop, but the middle loop has the important count.

I tried deleting the first and third display_progress() methods and 
re-ran the process on the Linux repo and did not notice a delay at the 
0% and 100% progress spots. The count matches the number of commits.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/2] commit-graph: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
@ 2018-09-05 12:07 ` " Derrick Stolee
  2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-09-05 12:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

On 9/4/2018 4:27 PM, Ævar Arnfjörð Bjarmason wrote:
> This series adds progress output to the commit-graph command, so that
> when it's called by "git gc" or "git fsck" we can see what's going on
> with it.
>
> Ævar Arnfjörð Bjarmason (2):
>    commit-graph write: add progress output
>    commit-graph verify: add progress output
>
>   commit-graph.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 43 insertions(+), 1 deletion(-)

Thanks for writing this, Ævar. I appreciate that you took the time to 
fill in an important UX gap.

I had a couple of comments, but generally this is a good change.

-Stolee


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 11:58     ` Derrick Stolee
@ 2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
  2018-09-05 21:46       ` Junio C Hamano
  2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-05 12:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git


On Wed, Sep 05 2018, Derrick Stolee wrote:

> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> With --stdin-packs we don't show any estimation of how much is left to
>>> do. This is because we might be processing more than one pack. We
>>> could be less lazy here and show progress, either detect by detecting
>>> that we're only processing one pack, or by first looping over the
>>> packs to discover how many commits they have. I don't see the point in
>> I do not know if there is no point, but if we were to do it, I think
>> slurping the list of packs and computing the number of objects is
>> not all that bad.
>
> If you want to do that, I have nothing against it. However, I don't
> expect users to use that option directly. That option is used by VFS
> for Git to compute the commit-graph in the background after receiving
> a pack of commits and trees, but not by 'git gc' which I expect is how
> most users will compute commit-graphs.

Yeah, I suspected only one guy at Microsoft would potentially benefit
from this, but added it just so we'd have progress regardless of entry
point :)

>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>   {
>>>   	int i;
>>>   	struct commit_list *list = NULL;
>>> +	struct progress *progress = NULL;
>>>   +	progress = start_progress(
>>> +		_("Computing commit graph generation numbers"), commits->nr);
>>>   	for (i = 0; i < commits->nr; i++) {
>>> +		display_progress(progress, i);
>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>   			continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.
>
> Something like the diff below would work. I tested it in Linux by
> first deleting my commit-graph and running the following:
>
> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
> commit-graph write --stdin-commits
> Annotating commits in commit graph: 1180333, done.
> Computing commit graph generation numbers: 100% (590166/590166), done.
> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
> Annotating commits in commit graph: 1564087, done.
> Computing commit graph generation numbers: 100% (191590/191590), done.
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Wed, 5 Sep 2018 11:55:42 +0000
> Subject: [PATCH] fixup! commit-graph write: add progress output
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-graph.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a02fe019a..b933bc9f00 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -634,14 +634,20 @@ static void close_reachable(struct
> packed_oid_list *oids)
>
> static void compute_generation_numbers(struct packed_commit_list* commits)
> {
> - int i;
> + int i, count_uncomputed = 0;
>  struct commit_list *list = NULL;
>  struct progress *progress = NULL;
>
> + for (i = 0; i < commits->nr; i++)
> + if (commits->list[i]->generation ==
> GENERATION_NUMBER_INFINITY ||
> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
> + count_uncomputed++;
> +
>  progress = start_progress(
> - _("Computing commit graph generation numbers"),
> commits->nr);
> + _("Computing commit graph generation numbers"),
> count_uncomputed);
> + count_uncomputed = 0;
> +
>  for (i = 0; i < commits->nr; i++) {
> - display_progress(progress, i);
>  if (commits->list[i]->generation !=
> GENERATION_NUMBER_INFINITY &&
>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  continue;
> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
> packed_commit_list* commits)
>
>  if (current->generation >
> GENERATION_NUMBER_MAX)
>  current->generation =
> GENERATION_NUMBER_MAX;
> +
> + display_progress(progress,
> ++count_uncomputed);
>  }
>  }
>  }
> - display_progress(progress, i);
>  stop_progress(&progress);
> }

Thanks! That looks good, and you obviously know this code a lot
better. I'll squash this into v2 pending further feedback I'll need to
address.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
@ 2018-09-05 21:46       ` Junio C Hamano
  2018-09-05 22:12         ` Derrick Stolee
  2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-05 21:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Ævar Arnfjörð Bjarmason, git

Derrick Stolee <stolee@gmail.com> writes:

>>>   	for (i = 0; i < commits->nr; i++) {
>>> +		display_progress(progress, i);
>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>   			continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.

Yeah, but I realize that the definition of "right" really depends on
what we consider a task being accomplished is in this loop.  If we
define the task to "we have some number of commits that lack
generation numbers and our task is to assign numbers to them.", then
yes counting the ones without generation number and culling the ones
that already have generation number is outside the work and we need
another loop to count them.  But the position the posted patch takes
is also a valid one: we have some commits and we are making sure
each and every one of them has assigned a generation number.

So I do not think it is necessary to introduce another loop just for
counting.

Thanks.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 21:46       ` Junio C Hamano
@ 2018-09-05 22:12         ` Derrick Stolee
  0 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-09-05 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git

On 9/5/2018 5:46 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>>>>    	for (i = 0; i < commits->nr; i++) {
>>>> +		display_progress(progress, i);
>>>>    		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>>    		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>>    			continue;
>>> I am wondering if the progress call should be moved after this
>>> conditional continue; would we want to count the entry whose
>>> generation is already known here?  Of course, as we give commits->nr
>>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>>> wrong.
>> If we wanted to be completely right, we would count the commits in the
>> list that do not have a generation number and report that as the 100%
>> ceiling.
> Yeah, but I realize that the definition of "right" really depends on
> what we consider a task being accomplished is in this loop.  If we
> define the task to "we have some number of commits that lack
> generation numbers and our task is to assign numbers to them.", then
> yes counting the ones without generation number and culling the ones
> that already have generation number is outside the work and we need
> another loop to count them.  But the position the posted patch takes
> is also a valid one: we have some commits and we are making sure
> each and every one of them has assigned a generation number.
>
> So I do not think it is necessary to introduce another loop just for
> counting.
>
> Thanks.
Makes sense to me. Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
                     ` (2 preceding siblings ...)
  2018-09-05 12:06   ` Derrick Stolee
@ 2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
  2018-09-07 13:12     ` Derrick Stolee
  3 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 12:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee


On Tue, Sep 04 2018, Ævar Arnfjörð Bjarmason wrote:

> Before this change the "commit-graph write" command didn't report any
> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.

There's a fun issue with this code that I'll fix, but thought was
informative to send a mail about.

Because the graph verification happens in the main "git gc" process, as
opposed to everything else via external commands, so all this progress
output gets written to .git/gc.log.

Then next time we do a "gc --auto" we vomit out a couple of KB of
progress bar output at the user, since spot that the gc.log isn't empty.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
@ 2018-09-07 13:12     ` Derrick Stolee
  0 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-09-07 13:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano

On 9/7/2018 8:40 AM, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Sep 04 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> Before this change the "commit-graph write" command didn't report any
>> progress. On my machine this command takes more than 10 seconds to
>> write the graph for linux.git, and around 1m30s on the
>> 2015-04-03-1M-git.git[1] test repository, which is a test case for
>> larger monorepos.
> There's a fun issue with this code that I'll fix, but thought was
> informative to send a mail about.
>
> Because the graph verification happens in the main "git gc" process, as
> opposed to everything else via external commands, so all this progress
> output gets written to .git/gc.log.
>
> Then next time we do a "gc --auto" we vomit out a couple of KB of
> progress bar output at the user, since spot that the gc.log isn't empty.
Good catch! (I do want to clarify that the graph _writing_ happens 
during 'git gc' since 'git commit-graph verify' is a different thing.)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-05 11:58     ` Derrick Stolee
  2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
  2018-09-05 21:46       ` Junio C Hamano
@ 2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
  2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 15:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git


On Wed, Sep 05 2018, Derrick Stolee wrote:

> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> With --stdin-packs we don't show any estimation of how much is left to
>>> do. This is because we might be processing more than one pack. We
>>> could be less lazy here and show progress, either detect by detecting
>>> that we're only processing one pack, or by first looping over the
>>> packs to discover how many commits they have. I don't see the point in
>> I do not know if there is no point, but if we were to do it, I think
>> slurping the list of packs and computing the number of objects is
>> not all that bad.
>
> If you want to do that, I have nothing against it. However, I don't
> expect users to use that option directly. That option is used by VFS
> for Git to compute the commit-graph in the background after receiving
> a pack of commits and trees, but not by 'git gc' which I expect is how
> most users will compute commit-graphs.
>
>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>   {
>>>   	int i;
>>>   	struct commit_list *list = NULL;
>>> +	struct progress *progress = NULL;
>>>   +	progress = start_progress(
>>> +		_("Computing commit graph generation numbers"), commits->nr);
>>>   	for (i = 0; i < commits->nr; i++) {
>>> +		display_progress(progress, i);
>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>   			continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.
>
> Something like the diff below would work. I tested it in Linux by
> first deleting my commit-graph and running the following:
>
> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
> commit-graph write --stdin-commits
> Annotating commits in commit graph: 1180333, done.
> Computing commit graph generation numbers: 100% (590166/590166), done.
> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
> Annotating commits in commit graph: 1564087, done.
> Computing commit graph generation numbers: 100% (191590/191590), done.
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Wed, 5 Sep 2018 11:55:42 +0000
> Subject: [PATCH] fixup! commit-graph write: add progress output
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-graph.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a02fe019a..b933bc9f00 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -634,14 +634,20 @@ static void close_reachable(struct
> packed_oid_list *oids)
>
> static void compute_generation_numbers(struct packed_commit_list* commits)
> {
> - int i;
> + int i, count_uncomputed = 0;
>  struct commit_list *list = NULL;
>  struct progress *progress = NULL;
>
> + for (i = 0; i < commits->nr; i++)
> + if (commits->list[i]->generation ==
> GENERATION_NUMBER_INFINITY ||
> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
> + count_uncomputed++;
> +
>  progress = start_progress(
> - _("Computing commit graph generation numbers"),
> commits->nr);
> + _("Computing commit graph generation numbers"),
> count_uncomputed);
> + count_uncomputed = 0;
> +
>  for (i = 0; i < commits->nr; i++) {
> - display_progress(progress, i);
>  if (commits->list[i]->generation !=
> GENERATION_NUMBER_INFINITY &&
>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  continue;
> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
> packed_commit_list* commits)
>
>  if (current->generation >
> GENERATION_NUMBER_MAX)
>  current->generation =
> GENERATION_NUMBER_MAX;
> +
> + display_progress(progress,
> ++count_uncomputed);
>  }
>  }
>  }
> - display_progress(progress, i);
>  stop_progress(&progress);
> }

One of the things I was trying to do with this series was to make sure
that whenever we run "git gc" there's always some indication that if you
set gc.writeCommitGraph=true that it's actualy doing work.

This modifies that, which I think is actually fine, just something I
wanted to note. I.e. if you run "git commit-graph write" twice in a row,
the second time will have no output.

Unless that is, your repo is big enough that some of the delayed timers
kick in. So e.g. on git.git we get no output the second time around, but
do get output the first time around, and on linux.git we always get
output.

But in the common case people aren't running this in a loop, and it's
useful to see how many new things are being added to the graph, so I
think this is better. Just wanted to note the behavior difference (and
will change the commit message).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
@ 2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
  2018-09-07 17:15           ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 15:23 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git


On Fri, Sep 07 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Sep 05 2018, Derrick Stolee wrote:
>
>> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>
>>>> With --stdin-packs we don't show any estimation of how much is left to
>>>> do. This is because we might be processing more than one pack. We
>>>> could be less lazy here and show progress, either detect by detecting
>>>> that we're only processing one pack, or by first looping over the
>>>> packs to discover how many commits they have. I don't see the point in
>>> I do not know if there is no point, but if we were to do it, I think
>>> slurping the list of packs and computing the number of objects is
>>> not all that bad.
>>
>> If you want to do that, I have nothing against it. However, I don't
>> expect users to use that option directly. That option is used by VFS
>> for Git to compute the commit-graph in the background after receiving
>> a pack of commits and trees, but not by 'git gc' which I expect is how
>> most users will compute commit-graphs.
>>
>>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>>   {
>>>>   	int i;
>>>>   	struct commit_list *list = NULL;
>>>> +	struct progress *progress = NULL;
>>>>   +	progress = start_progress(
>>>> +		_("Computing commit graph generation numbers"), commits->nr);
>>>>   	for (i = 0; i < commits->nr; i++) {
>>>> +		display_progress(progress, i);
>>>>   		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>>>>   		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>>>   			continue;
>>> I am wondering if the progress call should be moved after this
>>> conditional continue; would we want to count the entry whose
>>> generation is already known here?  Of course, as we give commits->nr
>>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>>> wrong.
>>
>> If we wanted to be completely right, we would count the commits in the
>> list that do not have a generation number and report that as the 100%
>> ceiling.
>>
>> Something like the diff below would work. I tested it in Linux by
>> first deleting my commit-graph and running the following:
>>
>> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
>> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
>> commit-graph write --stdin-commits
>> Annotating commits in commit graph: 1180333, done.
>> Computing commit graph generation numbers: 100% (590166/590166), done.
>> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
>> Annotating commits in commit graph: 1564087, done.
>> Computing commit graph generation numbers: 100% (191590/191590), done.
>>
>> -->8--
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>> Date: Wed, 5 Sep 2018 11:55:42 +0000
>> Subject: [PATCH] fixup! commit-graph write: add progress output
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>> commit-graph.c | 15 +++++++++++----
>> 1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 1a02fe019a..b933bc9f00 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -634,14 +634,20 @@ static void close_reachable(struct
>> packed_oid_list *oids)
>>
>> static void compute_generation_numbers(struct packed_commit_list* commits)
>> {
>> - int i;
>> + int i, count_uncomputed = 0;
>>  struct commit_list *list = NULL;
>>  struct progress *progress = NULL;
>>
>> + for (i = 0; i < commits->nr; i++)
>> + if (commits->list[i]->generation ==
>> GENERATION_NUMBER_INFINITY ||
>> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
>> + count_uncomputed++;
>> +
>>  progress = start_progress(
>> - _("Computing commit graph generation numbers"),
>> commits->nr);
>> + _("Computing commit graph generation numbers"),
>> count_uncomputed);
>> + count_uncomputed = 0;
>> +
>>  for (i = 0; i < commits->nr; i++) {
>> - display_progress(progress, i);
>>  if (commits->list[i]->generation !=
>> GENERATION_NUMBER_INFINITY &&
>>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>  continue;
>> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
>> packed_commit_list* commits)
>>
>>  if (current->generation >
>> GENERATION_NUMBER_MAX)
>>  current->generation =
>> GENERATION_NUMBER_MAX;
>> +
>> + display_progress(progress,
>> ++count_uncomputed);
>>  }
>>  }
>>  }
>> - display_progress(progress, i);
>>  stop_progress(&progress);
>> }
>
> One of the things I was trying to do with this series was to make sure
> that whenever we run "git gc" there's always some indication that if you
> set gc.writeCommitGraph=true that it's actualy doing work.
>
> This modifies that, which I think is actually fine, just something I
> wanted to note. I.e. if you run "git commit-graph write" twice in a row,
> the second time will have no output.
>
> Unless that is, your repo is big enough that some of the delayed timers
> kick in. So e.g. on git.git we get no output the second time around, but
> do get output the first time around, and on linux.git we always get
> output.
>
> But in the common case people aren't running this in a loop, and it's
> useful to see how many new things are being added to the graph, so I
> think this is better. Just wanted to note the behavior difference (and
> will change the commit message).

Hrm, no. I spoke too soon because I was conflating "commit-graph write"
v.s. "gc". For "gc" we're now with this change just e.g. spending 6
seconds on 2015-04-03-1M-git displaying nothing, because we're looping
through the commits and finding that we have no new work.

So I'm on the fence about this, but leaning towards just taking my
initial approch. I.e. it sucks if you're e.g. testing different "git gc"
options that we're churning in the background doing nothing, just
because we're trying to report how many *new* things we added to the
graph.

After all, the main point IMNSHO is not to show some diagnostic output
of exactly how much work we're doing, that I have 200 new commits with
generation numbers or whatever is just useless trivia, but rather to not
leave the user thinking the command is hanging.

So I think I'll just do what I was doing to begin with and change the
message to "Refreshing commit graph generation numbers" or something to
indicate that it's a find/verify/compute operation, not just a compute
operation.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
@ 2018-09-07 17:15           ` Jeff King
  2018-09-07 17:25             ` Derrick Stolee
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-09-07 17:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Junio C Hamano, git

On Fri, Sep 07, 2018 at 05:23:31PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Hrm, no. I spoke too soon because I was conflating "commit-graph write"
> v.s. "gc". For "gc" we're now with this change just e.g. spending 6
> seconds on 2015-04-03-1M-git displaying nothing, because we're looping
> through the commits and finding that we have no new work.
> 
> So I'm on the fence about this, but leaning towards just taking my
> initial approch. I.e. it sucks if you're e.g. testing different "git gc"
> options that we're churning in the background doing nothing, just
> because we're trying to report how many *new* things we added to the
> graph.
> 
> After all, the main point IMNSHO is not to show some diagnostic output
> of exactly how much work we're doing, that I have 200 new commits with
> generation numbers or whatever is just useless trivia, but rather to not
> leave the user thinking the command is hanging.

I think there's some precedent for your view of things, too. For
example, "writing objects" counts _all_ of the objects, even though many
of them are just copying bytes straight from disk, and some are actually
generating a delta and/or zlib-deflating content.

So it's not the most precise measurement we could give, but it shows
there's activity, and the "average" movement over many objects tends to
be reasonably smooth.

> So I think I'll just do what I was doing to begin with and change the
> message to "Refreshing commit graph generation numbers" or something to
> indicate that it's a find/verify/compute operation, not just a compute
> operation.

So basically yes, I agree with this. :)

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] commit-graph write: add progress output
  2018-09-07 17:15           ` Jeff King
@ 2018-09-07 17:25             ` Derrick Stolee
  0 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-09-07 17:25 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git

On 9/7/2018 1:15 PM, Jeff King wrote:
> On Fri, Sep 07, 2018 at 05:23:31PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Hrm, no. I spoke too soon because I was conflating "commit-graph write"
>> v.s. "gc". For "gc" we're now with this change just e.g. spending 6
>> seconds on 2015-04-03-1M-git displaying nothing, because we're looping
>> through the commits and finding that we have no new work.
>>
>> So I'm on the fence about this, but leaning towards just taking my
>> initial approch. I.e. it sucks if you're e.g. testing different "git gc"
>> options that we're churning in the background doing nothing, just
>> because we're trying to report how many *new* things we added to the
>> graph.
>>
>> After all, the main point IMNSHO is not to show some diagnostic output
>> of exactly how much work we're doing, that I have 200 new commits with
>> generation numbers or whatever is just useless trivia, but rather to not
>> leave the user thinking the command is hanging.
> I think there's some precedent for your view of things, too. For
> example, "writing objects" counts _all_ of the objects, even though many
> of them are just copying bytes straight from disk, and some are actually
> generating a delta and/or zlib-deflating content.
>
> So it's not the most precise measurement we could give, but it shows
> there's activity, and the "average" movement over many objects tends to
> be reasonably smooth.
>
>> So I think I'll just do what I was doing to begin with and change the
>> message to "Refreshing commit graph generation numbers" or something to
>> indicate that it's a find/verify/compute operation, not just a compute
>> operation.
> So basically yes, I agree with this. :)

Same here. Thanks for the discussion.

-Stolee


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/2] commit-graph: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2018-09-05 12:07 ` [PATCH 0/2] commit-graph: " Derrick Stolee
@ 2018-09-07 18:29 ` " Ævar Arnfjörð Bjarmason
  2018-09-11 20:26   ` Junio C Hamano
  2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 18:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Based on feedback on v1, and the "this is yelling at my users through
gc.log" bug I found. Range-diff with v1:

1:  e0a09ad641 ! 1:  b2dcfa0f55 commit-graph write: add progress output
    @@ -5,8 +5,8 @@
         Before this change the "commit-graph write" command didn't report any
         progress. On my machine this command takes more than 10 seconds to
         write the graph for linux.git, and around 1m30s on the
    -    2015-04-03-1M-git.git[1] test repository, which is a test case for
    -    larger monorepos.
    +    2015-04-03-1M-git.git[1] test repository (a test case for a large
    +    monorepository).
     
         Furthermore, since the gc.writeCommitGraph setting was added in
         d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
    @@ -19,7 +19,6 @@
             $ git -c gc.writeCommitGraph=true gc
             Enumerating objects: 2821, done.
             [...]
    -        Total 2821 (delta 1670), reused 2821 (delta 1670)
             Computing commit graph generation numbers: 100% (867/867), done.
     
         On larger repositories, such as linux.git the delayed progress bar(s)
    @@ -27,6 +26,7 @@
         previously happening, printing nothing while we write the graph:
     
             $ git -c gc.writeCommitGraph=true gc
    +        [...]
             Annotating commits in commit graph: 1565573, done.
             Computing commit graph generation numbers: 100% (782484/782484), done.
     
    @@ -42,20 +42,90 @@
     
         With --stdin-packs we don't show any estimation of how much is left to
         do. This is because we might be processing more than one pack. We
    -    could be less lazy here and show progress, either detect by detecting
    -    that we're only processing one pack, or by first looping over the
    -    packs to discover how many commits they have. I don't see the point in
    -    doing that work. So instead we get (on 2015-04-03-1M-git.git):
    +    could be less lazy here and show progress, either by detecting that
    +    we're only processing one pack, or by first looping over the packs to
    +    discover how many commits they have. I don't see the point in doing
    +    that work. So instead we get (on 2015-04-03-1M-git.git):
     
             $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
             Finding commits for commit graph: 13064614, done.
             Annotating commits in commit graph: 3001341, done.
             Computing commit graph generation numbers: 100% (1000447/1000447), done.
     
    +    No GC mode uses --stdin-packs. It's what they use at Microsoft to
    +    manually compute the generation numbers for their collection of large
    +    packs which are never coalesced.
    +
    +    The reason we need a "report_progress" variable passed down from "git
    +    gc" is so that we don't report this output when we're running in the
    +    process "git gc --auto" detaches from the terminal.
    +
    +    Since we write the commit graph from the "git gc" process itself (as
    +    opposed to what we do with say the "git repack" phase), we'd end up
    +    writing the output to .git/gc.log and reporting it to the user next
    +    time as part of the "The last gc run reported the following[...]"
    +    error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
    +    print it next time", 2015-09-19).
    +
    +    So we must keep track of whether or not we're running in that
    +    demonized mode, and if so print no progress.
    +
    +    See [2] and subsequent replies for a discussion of an approach not
    +    taken in compute_generation_numbers(). I.e. we're saying "Computing
    +    commit graph generation numbers", even though on an established
    +    history we're mostly skipping over all the work we did in the
    +    past. This is similar to the white lie we tell in the "Writing
    +    objects" phase (not all are objects being written).
    +
    +    Always showing progress is considered more important than
    +    accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
    +    for 6 seconds with no output on the second "git gc" if no changes were
    +    made to any objects in the interim if we'd take the approach in [2].
    +
         1. https://github.com/avar/2015-04-03-1M-git
     
    +    2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
    +       (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
    + --- a/builtin/commit-graph.c
    + +++ b/builtin/commit-graph.c
    +@@
    + 		opts.obj_dir = get_object_directory();
    + 
    + 	if (opts.reachable) {
    +-		write_commit_graph_reachable(opts.obj_dir, opts.append);
    ++		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
    + 		return 0;
    + 	}
    + 
    +@@
    + 	write_commit_graph(opts.obj_dir,
    + 			   pack_indexes,
    + 			   commit_hex,
    +-			   opts.append);
    ++			   opts.append,
    ++			   1);
    + 
    + 	string_list_clear(&lines, 0);
    + 	return 0;
    +
    + diff --git a/builtin/gc.c b/builtin/gc.c
    + --- a/builtin/gc.c
    + +++ b/builtin/gc.c
    +@@
    + 		clean_pack_garbage();
    + 
    + 	if (gc_write_commit_graph)
    +-		write_commit_graph_reachable(get_object_directory(), 0);
    ++		write_commit_graph_reachable(get_object_directory(), 0,
    ++					     !daemonized);
    + 
    + 	if (auto_gc && too_many_loose_objects())
    + 		warning(_("There are too many unreachable loose objects; "
    +
      diff --git a/commit-graph.c b/commit-graph.c
      --- a/commit-graph.c
      +++ b/commit-graph.c
    @@ -87,14 +157,20 @@
      	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
      		die(_("unable to get type of object %s"), oid_to_hex(oid));
     @@
    + 	}
    + }
    + 
    +-static void close_reachable(struct packed_oid_list *oids)
    ++static void close_reachable(struct packed_oid_list *oids, int report_progress)
      {
      	int i;
      	struct commit *commit;
     +	struct progress *progress = NULL;
     +	int j = 0;
      
    -+	progress = start_delayed_progress(
    -+		_("Annotating commits in commit graph"), 0);
    ++	if (report_progress)
    ++		progress = start_delayed_progress(
    ++			_("Annotating commits in commit graph"), 0);
      	for (i = 0; i < oids->nr; i++) {
     +		display_progress(progress, ++j);
      		commit = lookup_commit(the_repository, &oids->list[i]);
    @@ -121,16 +197,20 @@
     +	stop_progress(&progress);
      }
      
    - static void compute_generation_numbers(struct packed_commit_list* commits)
    +-static void compute_generation_numbers(struct packed_commit_list* commits)
    ++static void compute_generation_numbers(struct packed_commit_list* commits, 
    ++				       int report_progress)
      {
      	int i;
      	struct commit_list *list = NULL;
     +	struct progress *progress = NULL;
      
    -+	progress = start_progress(
    -+		_("Computing commit graph generation numbers"), commits->nr);
    ++	if (report_progress)
    ++		progress = start_progress(
    ++			_("Computing commit graph generation numbers"),
    ++			commits->nr);
      	for (i = 0; i < commits->nr; i++) {
    -+		display_progress(progress, i);
    ++		display_progress(progress, i + 1);
      		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
      		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
      			continue;
    @@ -138,11 +218,34 @@
      			}
      		}
      	}
    -+	display_progress(progress, i);
     +	stop_progress(&progress);
      }
      
      static int add_ref_to_list(const char *refname,
    +@@
    + 	return 0;
    + }
    + 
    +-void write_commit_graph_reachable(const char *obj_dir, int append)
    ++void write_commit_graph_reachable(const char *obj_dir, int append,
    ++				  int report_progress)
    + {
    + 	struct string_list list;
    + 
    + 	string_list_init(&list, 1);
    + 	for_each_ref(add_ref_to_list, &list);
    +-	write_commit_graph(obj_dir, NULL, &list, append);
    ++	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
    + }
    + 
    + void write_commit_graph(const char *obj_dir,
    + 			struct string_list *pack_indexes,
    + 			struct string_list *commit_hex,
    +-			int append)
    ++			int append, int report_progress)
    + {
    + 	struct packed_oid_list oids;
    + 	struct packed_commit_list commits;
     @@
      	int num_chunks;
      	int num_extra_edges;
    @@ -160,9 +263,11 @@
      		int dirlen;
      		strbuf_addf(&packname, "%s/pack/", obj_dir);
      		dirlen = packname.len;
    -+		oids.progress = start_delayed_progress(
    -+			_("Finding commits for commit graph"), 0);
    -+		oids.progress_done = 0;
    ++		if (report_progress) {
    ++			oids.progress = start_delayed_progress(
    ++				_("Finding commits for commit graph"), 0);
    ++			oids.progress_done = 0;
    ++		}
      		for (i = 0; i < pack_indexes->nr; i++) {
      			struct packed_git *p;
      			strbuf_setlen(&packname, dirlen);
    @@ -175,14 +280,16 @@
      	}
      
      	if (commit_hex) {
    -+		progress = start_delayed_progress(
    -+			_("Finding commits for commit graph"), commit_hex->nr);
    ++		if (report_progress)
    ++			progress = start_delayed_progress(
    ++				_("Finding commits for commit graph"),
    ++				commit_hex->nr);
      		for (i = 0; i < commit_hex->nr; i++) {
      			const char *end;
      			struct object_id oid;
      			struct commit *result;
      
    -+			display_progress(progress, i);
    ++			display_progress(progress, i + 1);
      			if (commit_hex->items[i].string &&
      			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
      				continue;
    @@ -190,17 +297,48 @@
      				oids.nr++;
      			}
      		}
    -+		display_progress(progress, i);
     +		stop_progress(&progress);
      	}
      
     -	if (!pack_indexes && !commit_hex)
     +	if (!pack_indexes && !commit_hex) {
    -+		oids.progress = start_delayed_progress(
    -+			_("Finding commits for commit graph"), 0);
    ++		if (report_progress)
    ++			oids.progress = start_delayed_progress(
    ++				_("Finding commits for commit graph"), 0);
      		for_each_packed_object(add_packed_commits, &oids, 0);
     +		stop_progress(&oids.progress);
     +	}
      
    - 	close_reachable(&oids);
    +-	close_reachable(&oids);
    ++	close_reachable(&oids, report_progress);
    + 
    + 	QSORT(oids.list, oids.nr, commit_compare);
    + 
    +@@
    + 	if (commits.nr >= GRAPH_PARENT_MISSING)
    + 		die(_("too many commits to write graph"));
    + 
    +-	compute_generation_numbers(&commits);
    ++	compute_generation_numbers(&commits, report_progress);
    + 
    + 	graph_name = get_commit_graph_filename(obj_dir);
    + 	if (safe_create_leading_directories(graph_name))
    +
    + diff --git a/commit-graph.h b/commit-graph.h
    + --- a/commit-graph.h
    + +++ b/commit-graph.h
    +@@
    + 
    + struct commit_graph *load_commit_graph_one(const char *graph_file);
    + 
    +-void write_commit_graph_reachable(const char *obj_dir, int append);
    ++void write_commit_graph_reachable(const char *obj_dir, int append,
    ++				  int report_progress);
    + void write_commit_graph(const char *obj_dir,
    + 			struct string_list *pack_indexes,
    + 			struct string_list *commit_hex,
    +-			int append);
    ++			int append, int report_progress);
    + 
    + int verify_commit_graph(struct repository *r, struct commit_graph *g);
      
2:  a364297d15 ! 2:  775237cffb commit-graph verify: add progress output
    @@ -23,6 +23,10 @@
             real    0m7.813s
             [...]
     
    +    Since the "commit-graph verify" subcommand is never called from "git
    +    gc", we don't have to worry about passing some some "report_progress"
    +    progress variable around for this codepath.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      diff --git a/commit-graph.c b/commit-graph.c
    @@ -47,7 +51,7 @@
      		struct commit_list *graph_parents, *odb_parents;
      		uint32_t max_generation = 0;
      
    -+		display_progress(progress, i);
    ++		display_progress(progress, i + 1);
      		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
      
      		graph_commit = lookup_commit(r, &cur_oid);
    @@ -55,7 +59,6 @@
      				     graph_commit->date,
      				     odb_commit->date);
      	}
    -+	display_progress(progress, i);
     +	stop_progress(&progress);
      
      	return verify_commit_graph_error;

Ævar Arnfjörð Bjarmason (2):
  commit-graph write: add progress output
  commit-graph verify: add progress output

 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 +-
 commit-graph.c         | 65 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 65 insertions(+), 13 deletions(-)

-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2018-09-07 18:29 ` " Ævar Arnfjörð Bjarmason
  2018-09-21 20:01   ` Derrick Stolee
  2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 18:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    [...]
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.

The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.

Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).

So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.

See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).

Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].

1. https://github.com/avar/2015-04-03-1M-git

2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
   (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 ++-
 commit-graph.c         | 60 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..bc0fa9ba52 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -151,7 +151,7 @@ static int graph_write(int argc, const char **argv)
 		opts.obj_dir = get_object_directory();
 
 	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 		return 0;
 	}
 
@@ -171,7 +171,8 @@ static int graph_write(int argc, const char **argv)
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
 			   commit_hex,
-			   opts.append);
+			   opts.append,
+			   1);
 
 	string_list_clear(&lines, 0);
 	return 0;
diff --git a/builtin/gc.c b/builtin/gc.c
index 57069442b0..06ddd3aea5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -646,7 +646,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 
 	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0);
+		write_commit_graph_reachable(get_object_directory(), 0,
+					     !daemonized);
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..2c5d996194 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@ struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -587,12 +593,18 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids)
+static void close_reachable(struct packed_oid_list *oids, int report_progress)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits)
+static void compute_generation_numbers(struct packed_commit_list* commits, 
+				       int report_progress)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	if (report_progress)
+		progress = start_progress(
+			_("Computing commit graph generation numbers"),
+			commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i + 1);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +677,7 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -667,19 +690,20 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append)
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress)
 {
 	struct string_list list;
 
 	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append);
+	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
 }
 
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append)
+			int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -692,9 +716,12 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +748,11 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		if (report_progress) {
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
+			oids.progress_done = 0;
+		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +765,21 @@ void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		if (report_progress)
+			progress = start_delayed_progress(
+				_("Finding commits for commit graph"),
+				commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,12 +792,18 @@ void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		if (report_progress)
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
-	close_reachable(&oids);
+	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
@@ -799,7 +843,7 @@ void write_commit_graph(const char *obj_dir,
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
 
-	compute_generation_numbers(&commits);
+	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name))
diff --git a/commit-graph.h b/commit-graph.h
index eea62f8c0e..f50712a973 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -52,11 +52,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph_reachable(const char *obj_dir, int append);
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append);
+			int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 2/2] commit-graph verify: add progress output
  2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-07 18:29 ` " Ævar Arnfjörð Bjarmason
  2018-09-16  6:55   ` Duy Nguyen
  5 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-07 18:29 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

For the reasons explained in the "commit-graph write: add progress
output" commit leading up to this one, emit progress on "commit-graph
verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
"git fsck" has called this command if core.commitGraph=true, but
there's been no progress output to indicate that anything was
different. Now there is (on my tiny dotfiles.git repository):

    $ git -c core.commitGraph=true -C ~/ fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2821/2821), done.
    dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
    Verifying commits in commit graph: 100% (867/867), done.

And on a larger repository, such as the 2015-04-03-1M-git.git test
repository:

    $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
    Verifying commits in commit graph: 100% (1000447/1000447), done.
    real    0m7.813s
    [...]

Since the "commit-graph verify" subcommand is never called from "git
gc", we don't have to worry about passing some some "report_progress"
progress variable around for this codepath.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2c5d996194..0bfb8c180e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -922,6 +922,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
+	struct progress *progress = NULL;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -989,11 +990,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
+	progress = start_progress("Verifying commits in commit graph",
+				  g->num_commits);
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
 
+		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
@@ -1070,6 +1074,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     graph_commit->date,
 				     odb_commit->date);
 	}
+	stop_progress(&progress);
 
 	return verify_commit_graph_error;
 }
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/2] commit-graph: add progress output
  2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2018-09-11 20:26   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11 20:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Based on feedback on v1, and the "this is yelling at my users through
> gc.log" bug I found.

I notice that between 'master' and 'pu' there already is one new
callsite of the write_commit_graph_reachable() function; because I
suspect that we will discover more places that give us better
trade-off to have a call to the function, I would not be surprised
if this will conflict with more in-flight topics soonish.

Let me try to queue it and see what happens.

Thanks.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/2] commit-graph verify: add progress output
  2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
@ 2018-09-16  6:55   ` Duy Nguyen
  2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Duy Nguyen @ 2018-09-16  6:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff King,
	Eric Sunshine

On Fri, Sep 7, 2018 at 8:30 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> @@ -989,11 +990,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>         if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>                 return verify_commit_graph_error;
>
> +       progress = start_progress("Verifying commits in commit graph",

_()

> +                                 g->num_commits);
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 0/2] commit-graph: add progress output
  2018-09-16  6:55   ` Duy Nguyen
@ 2018-09-17 15:33     ` " Ævar Arnfjörð Bjarmason
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-09-17 15:33     ` [PATCH v3 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Micro change since v2: Missing _() in 2/2 pointed out by Duy.

Ævar Arnfjörð Bjarmason (2):
  commit-graph write: add progress output
  commit-graph verify: add progress output

 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 +-
 commit-graph.c         | 65 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 65 insertions(+), 13 deletions(-)

-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 1/2] commit-graph write: add progress output
  2018-09-16  6:55   ` Duy Nguyen
  2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
@ 2018-09-17 15:33     ` " Ævar Arnfjörð Bjarmason
  2018-10-10 20:37       ` SZEDER Gábor
  2018-10-15 16:54       ` SZEDER Gábor
  2018-09-17 15:33     ` [PATCH v3 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).

Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:

    $ git -c gc.writeCommitGraph=true gc
    Enumerating objects: 2821, done.
    [...]
    Computing commit graph generation numbers: 100% (867/867), done.

On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:

    $ git -c gc.writeCommitGraph=true gc
    [...]
    Annotating commits in commit graph: 1565573, done.
    Computing commit graph generation numbers: 100% (782484/782484), done.

Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:

    $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
    Finding commits for commit graph: 100% (162576/162576), done.
    Computing commit graph generation numbers: 100% (162576/162576), done.

With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):

    $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
    Finding commits for commit graph: 13064614, done.
    Annotating commits in commit graph: 3001341, done.
    Computing commit graph generation numbers: 100% (1000447/1000447), done.

No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.

The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.

Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).

So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.

See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).

Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].

1. https://github.com/avar/2015-04-03-1M-git

2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
   (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit-graph.c |  5 ++--
 builtin/gc.c           |  3 ++-
 commit-graph.c         | 60 ++++++++++++++++++++++++++++++++++++------
 commit-graph.h         |  5 ++--
 4 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0bf0c48657..bc0fa9ba52 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -151,7 +151,7 @@ static int graph_write(int argc, const char **argv)
 		opts.obj_dir = get_object_directory();
 
 	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 		return 0;
 	}
 
@@ -171,7 +171,8 @@ static int graph_write(int argc, const char **argv)
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
 			   commit_hex,
-			   opts.append);
+			   opts.append,
+			   1);
 
 	string_list_clear(&lines, 0);
 	return 0;
diff --git a/builtin/gc.c b/builtin/gc.c
index 57069442b0..06ddd3aea5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -646,7 +646,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 
 	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0);
+		write_commit_graph_reachable(get_object_directory(), 0,
+					     !daemonized);
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..2c5d996194 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -13,6 +13,7 @@
 #include "commit-graph.h"
 #include "object-store.h"
 #include "alloc.h"
+#include "progress.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -548,6 +549,8 @@ struct packed_oid_list {
 	struct object_id *list;
 	int nr;
 	int alloc;
+	struct progress *progress;
+	int progress_done;
 };
 
 static int add_packed_commits(const struct object_id *oid,
@@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
+	if (list->progress)
+		display_progress(list->progress, ++list->progress_done);
+
 	oi.typep = &type;
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
@@ -587,12 +593,18 @@ static void add_missing_parents(struct packed_oid_list *oids, struct commit *com
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids)
+static void close_reachable(struct packed_oid_list *oids, int report_progress)
 {
 	int i;
 	struct commit *commit;
+	struct progress *progress = NULL;
+	int j = 0;
 
+	if (report_progress)
+		progress = start_delayed_progress(
+			_("Annotating commits in commit graph"), 0);
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
@@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
 	 * closure.
 	 */
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit && !parse_commit(commit))
@@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 
 	for (i = 0; i < oids->nr; i++) {
+		display_progress(progress, ++j);
 		commit = lookup_commit(the_repository, &oids->list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
+	stop_progress(&progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits)
+static void compute_generation_numbers(struct packed_commit_list* commits, 
+				       int report_progress)
 {
 	int i;
 	struct commit_list *list = NULL;
+	struct progress *progress = NULL;
 
+	if (report_progress)
+		progress = start_progress(
+			_("Computing commit graph generation numbers"),
+			commits->nr);
 	for (i = 0; i < commits->nr; i++) {
+		display_progress(progress, i + 1);
 		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
 		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
@@ -655,6 +677,7 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 			}
 		}
 	}
+	stop_progress(&progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -667,19 +690,20 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append)
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress)
 {
 	struct string_list list;
 
 	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append);
+	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
 }
 
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append)
+			int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -692,9 +716,12 @@ void write_commit_graph(const char *obj_dir,
 	int num_chunks;
 	int num_extra_edges;
 	struct commit_list *parent;
+	struct progress *progress = NULL;
 
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
+	oids.progress = NULL;
+	oids.progress_done = 0;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -721,6 +748,11 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
+		if (report_progress) {
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
+			oids.progress_done = 0;
+		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
@@ -733,15 +765,21 @@ void write_commit_graph(const char *obj_dir,
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
 		}
+		stop_progress(&oids.progress);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
+		if (report_progress)
+			progress = start_delayed_progress(
+				_("Finding commits for commit graph"),
+				commit_hex->nr);
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
+			display_progress(progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
@@ -754,12 +792,18 @@ void write_commit_graph(const char *obj_dir,
 				oids.nr++;
 			}
 		}
+		stop_progress(&progress);
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commit_hex) {
+		if (report_progress)
+			oids.progress = start_delayed_progress(
+				_("Finding commits for commit graph"), 0);
 		for_each_packed_object(add_packed_commits, &oids, 0);
+		stop_progress(&oids.progress);
+	}
 
-	close_reachable(&oids);
+	close_reachable(&oids, report_progress);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
@@ -799,7 +843,7 @@ void write_commit_graph(const char *obj_dir,
 	if (commits.nr >= GRAPH_PARENT_MISSING)
 		die(_("too many commits to write graph"));
 
-	compute_generation_numbers(&commits);
+	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name))
diff --git a/commit-graph.h b/commit-graph.h
index eea62f8c0e..f50712a973 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -52,11 +52,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph_reachable(const char *obj_dir, int append);
+void write_commit_graph_reachable(const char *obj_dir, int append,
+				  int report_progress);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
-			int append);
+			int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 2/2] commit-graph verify: add progress output
  2018-09-16  6:55   ` Duy Nguyen
  2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-17 15:33     ` " Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

For the reasons explained in the "commit-graph write: add progress
output" commit leading up to this one, emit progress on "commit-graph
verify". Since e0fd51e1d7 ("fsck: verify commit-graph", 2018-06-27)
"git fsck" has called this command if core.commitGraph=true, but
there's been no progress output to indicate that anything was
different. Now there is (on my tiny dotfiles.git repository):

    $ git -c core.commitGraph=true -C ~/ fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2821/2821), done.
    dangling blob 5b8bbdb9b788ed90459f505b0934619c17cc605b
    Verifying commits in commit graph: 100% (867/867), done.

And on a larger repository, such as the 2015-04-03-1M-git.git test
repository:

    $ time git -c core.commitGraph=true -C ~/g/2015-04-03-1M-git/ commit-graph verify
    Verifying commits in commit graph: 100% (1000447/1000447), done.
    real    0m7.813s
    [...]

Since the "commit-graph verify" subcommand is never called from "git
gc", we don't have to worry about passing some some "report_progress"
progress variable around for this codepath.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 commit-graph.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2c5d996194..e6e4c03986 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -922,6 +922,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	int generation_zero = 0;
 	struct hashfile *f;
 	int devnull;
+	struct progress *progress = NULL;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -989,11 +990,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
+	progress = start_progress(_("Verifying commits in commit graph"),
+				  g->num_commits);
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
 
+		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		graph_commit = lookup_commit(r, &cur_oid);
@@ -1070,6 +1074,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     graph_commit->date,
 				     odb_commit->date);
 	}
+	stop_progress(&progress);
 
 	return verify_commit_graph_error;
 }
-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-09-21 20:01   ` Derrick Stolee
  2018-09-21 21:43     ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Derrick Stolee @ 2018-09-21 20:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On 9/7/2018 2:29 PM, Ævar Arnfjörð Bjarmason wrote:
> -void write_commit_graph_reachable(const char *obj_dir, int append);
> +void write_commit_graph_reachable(const char *obj_dir, int append,
> +				  int report_progress);
>   void write_commit_graph(const char *obj_dir,
>   			struct string_list *pack_indexes,
>   			struct string_list *commit_hex,
> -			int append);
> +			int append, int report_progress);
>   
>   int verify_commit_graph(struct repository *r, struct commit_graph *g);
>   

Junio,

The above prototype change seems to have created a semantic conflict 
with ds/commit-graph-tests (859fdc "commit-graph: define 
GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we 
call write_commit_graph_reachable() but the final parameter was resolved 
to be "1" instead of "0".

This causes t3420-rebase-autostash.sh to fail, as that test watches the 
full output of the rebase command, including commit runs. The following 
patch fixes the problem, but could probably be squashed into a merge or 
other commit.

Thanks,

-Stolee

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 21 Sep 2018 19:57:36 +0000
Subject: [PATCH] commit: quietly write commit-graph in tests

The GIT_TEST_COMMIT_GRAPH environment variable causes git-commit to
write a commit-graph file on every execution. Recently, we added
progress output when writing the commit-graph. This conflicts with
some expected output during some tests, so avoid writing progress
if writing a commit-graph this way.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
  builtin/commit.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 2a49ab4917..764664d832 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1660,7 +1660,7 @@ int cmd_commit(int argc, const char **argv, const 
char *prefix)
                       "not exceeded, and then \"git reset HEAD\" to 
recover."));

         if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
- write_commit_graph_reachable(get_object_directory(), 0, 1);
+ write_commit_graph_reachable(get_object_directory(), 0, 0);

         rerere(0);
         run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
--
2.19.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-21 20:01   ` Derrick Stolee
@ 2018-09-21 21:43     ` Junio C Hamano
  2018-09-21 21:57       ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-21 21:43 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Eric Sunshine, Nguyễn Thái Ngọc Duy

Derrick Stolee <stolee@gmail.com> writes:

> On 9/7/2018 2:29 PM, Ævar Arnfjörð Bjarmason wrote:
>> -void write_commit_graph_reachable(const char *obj_dir, int append);
>> +void write_commit_graph_reachable(const char *obj_dir, int append,
>> +				  int report_progress);
>>   void write_commit_graph(const char *obj_dir,
>>   			struct string_list *pack_indexes,
>>   			struct string_list *commit_hex,
>> -			int append);
>> +			int append, int report_progress);
>>     int verify_commit_graph(struct repository *r, struct
>> commit_graph *g);
>>   
>
> Junio,
>
> The above prototype change seems to have created a semantic conflict
> with ds/commit-graph-tests (859fdc "commit-graph: define
> GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we
> call write_commit_graph_reachable() but the final parameter was
> resolved to be "1" instead of "0".

Hmph.  That's unfortunate.

Perhaps one of the topics should have yielded and waited until the
other one passes through.

As 859fdc0c ("commit-graph: define GIT_TEST_COMMIT_GRAPH",
2018-08-29) already is in 'master', the other "progress" topic
probably should be corrected to match.  The easiest and cleanest
would be to eject the ab/commit-graph-progress topic out of 'next'
and have it rerolled on top of 'master', as we are going to rewind
the tip of 'next' anyway.

While we are at it, I suspect that a saner evolution of the API into
the function would not append more parameters to the call, but would
make the "do we append?" bit into a flag word "unsigned flags" with
two bits, and such a clean-up can be done as a preliminary change.

> This causes t3420-rebase-autostash.sh to fail, as that test watches
> the full output of the rebase command, including commit runs. The
> following patch fixes the problem, but could probably be squashed into
> a merge or other commit.
>
> Thanks,
>
> -Stolee
>
> -->8--
>
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 21 Sep 2018 19:57:36 +0000
> Subject: [PATCH] commit: quietly write commit-graph in tests
>
> The GIT_TEST_COMMIT_GRAPH environment variable causes git-commit to
> write a commit-graph file on every execution. Recently, we added
> progress output when writing the commit-graph. This conflicts with
> some expected output during some tests, so avoid writing progress
> if writing a commit-graph this way.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 2a49ab4917..764664d832 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1660,7 +1660,7 @@ int cmd_commit(int argc, const char **argv,
> const char *prefix)
>                       "not exceeded, and then \"git reset HEAD\" to
> recover."));
>
>         if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
> - write_commit_graph_reachable(get_object_directory(), 0, 1);
> + write_commit_graph_reachable(get_object_directory(), 0, 0);
>
>         rerere(0);
>         run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
> --
> 2.19.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/2] commit-graph write: add progress output
  2018-09-21 21:43     ` Junio C Hamano
@ 2018-09-21 21:57       ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-21 21:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Eric Sunshine, Nguyễn Thái Ngọc Duy

Junio C Hamano <gitster@pobox.com> writes:

>> The above prototype change seems to have created a semantic conflict
>> with ds/commit-graph-tests (859fdc "commit-graph: define
>> GIT_TEST_COMMIT_GRAPH") because when GIT_TEST_COMMIT_GRAPH is set, we
>> call write_commit_graph_reachable() but the final parameter was
>> resolved to be "1" instead of "0".
>
> Hmph.  That's unfortunate.
>
> Perhaps one of the topics should have yielded and waited until the
> other one passes through.

Nah, I see where things went wrong.  I'll queue a single-liner
"mismerge fix" to 'next', and then correct the seed for the evil
merge kept in merge-fix/ab/commit-graph-progress, and rebuild 'pu'.
Things will straighten out by themselves after that happens.

Thanks for noticing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
@ 2018-10-10 20:37       ` SZEDER Gábor
  2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
  2018-10-12  6:09         ` Junio C Hamano
  2018-10-15 16:54       ` SZEDER Gábor
  1 sibling, 2 replies; 39+ messages in thread
From: SZEDER Gábor @ 2018-10-10 20:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:
>     $ git -c gc.writeCommitGraph=true gc
>     [...]
>     Annotating commits in commit graph: 1565573, done.
>     Computing commit graph generation numbers: 100% (782484/782484), done.

While poking around 'commit-graph.c' in my Bloom filter experiment, I
saw similar numbers like above, and was confused by the much higher
than expected number of annotated commits.  It's about twice as much
as the number of commits in the repository, or the number shown on the
very next line.

> diff --git a/commit-graph.c b/commit-graph.c
> index 8a1bec7b8a..2c5d996194 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> -static void close_reachable(struct packed_oid_list *oids)
> +static void close_reachable(struct packed_oid_list *oids, int report_progress)
>  {
>  	int i;
>  	struct commit *commit;
> +	struct progress *progress = NULL;
> +	int j = 0;
>  
> +	if (report_progress)
> +		progress = start_delayed_progress(
> +			_("Annotating commits in commit graph"), 0);
>  	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  		if (commit)
>  			commit->object.flags |= UNINTERESTING;
> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>  	 * closure.
>  	 */
>  	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);
>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  
>  		if (commit && !parse_commit(commit))
> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>  	}

The above loops have already counted all the commits, and, more
importantly, did all the hard work that takes time and makes the
progress indicator useful.

>  	for (i = 0; i < oids->nr; i++) {
> +		display_progress(progress, ++j);

This display_progress() call, however, doesn't seem to be necessary.
First, it counts all commits for a second time, resulting in the ~2x
difference compared to the actual number of commits, and then causing
my confusion.  Second, all what this loop is doing is setting a flag
in commits that were already looked up and parsed in the above loops.
IOW this loop is very fast, and the progress indicator jumps from
~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
progress indicator at all.

>  		commit = lookup_commit(the_repository, &oids->list[i]);
>  
>  		if (commit)
>  			commit->object.flags &= ~UNINTERESTING;
>  	}
> +	stop_progress(&progress);
>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 20:37       ` SZEDER Gábor
@ 2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
  2018-10-10 22:19           ` SZEDER Gábor
  2018-10-12  6:09         ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-10 21:56 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:
>>     $ git -c gc.writeCommitGraph=true gc
>>     [...]
>>     Annotating commits in commit graph: 1565573, done.
>>     Computing commit graph generation numbers: 100% (782484/782484), done.
>
> While poking around 'commit-graph.c' in my Bloom filter experiment, I
> saw similar numbers like above, and was confused by the much higher
> than expected number of annotated commits.  It's about twice as much
> as the number of commits in the repository, or the number shown on the
> very next line.
>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 8a1bec7b8a..2c5d996194 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> -static void close_reachable(struct packed_oid_list *oids)
>> +static void close_reachable(struct packed_oid_list *oids, int report_progress)
>>  {
>>  	int i;
>>  	struct commit *commit;
>> +	struct progress *progress = NULL;
>> +	int j = 0;
>>
>> +	if (report_progress)
>> +		progress = start_delayed_progress(
>> +			_("Annotating commits in commit graph"), 0);
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>  		if (commit)
>>  			commit->object.flags |= UNINTERESTING;
>> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>>  	 * closure.
>>  	 */
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>
>>  		if (commit && !parse_commit(commit))
>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>>  	}
>
> The above loops have already counted all the commits, and, more
> importantly, did all the hard work that takes time and makes the
> progress indicator useful.
>
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);

[...]

> This display_progress() call, however, doesn't seem to be necessary.
> First, it counts all commits for a second time, resulting in the ~2x
> difference compared to the actual number of commits, and then causing
> my confusion.  Second, all what this loop is doing is setting a flag
> in commits that were already looked up and parsed in the above loops.
> IOW this loop is very fast, and the progress indicator jumps from
> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> progress indicator at all.

You're right, I tried this patch on top:

    diff --git a/commit-graph.c b/commit-graph.c
    index a686758603..cccd83de72 100644
    --- a/commit-graph.c
    +++ b/commit-graph.c
    @@ -655,12 +655,16 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
     		if (commit)
     			commit->object.flags |= UNINTERESTING;
     	}
    +	stop_progress(&progress); j = 0;

     	/*
     	 * As this loop runs, oids->nr may grow, but not more
     	 * than the number of missing commits in the reachable
     	 * closure.
     	 */
    +	if (report_progress)
    +		progress = start_delayed_progress(
    +			_("Annotating commits in commit graph 2"), 0);
     	for (i = 0; i < oids->nr; i++) {
     		display_progress(progress, ++j);
     		commit = lookup_commit(the_repository, &oids->list[i]);
    @@ -668,7 +672,11 @@ static void close_reachable(struct packed_oid_list *oids, int report_progress)
     		if (commit && !parse_commit(commit))
     			add_missing_parents(oids, commit);
     	}
    +	stop_progress(&progress); j = 0;

    +	if (report_progress)
    +		progress = start_delayed_progress(
    +			_("Annotating commits in commit graph 3"), 0);
     	for (i = 0; i < oids->nr; i++) {
     		display_progress(progress, ++j);
     		commit = lookup_commit(the_repository, &oids->list[i]);

And on a large repo with around 3 million commits the 3rd progress bar
doesn't kick in.

But if I apply this on top:

    diff --git a/progress.c b/progress.c
    index 5a99c9fbf0..89cc705bf7 100644
    --- a/progress.c
    +++ b/progress.c
    @@ -58,8 +58,8 @@ static void set_progress_signal(void)
     	sa.sa_flags = SA_RESTART;
     	sigaction(SIGALRM, &sa, NULL);

    -	v.it_interval.tv_sec = 1;
    -	v.it_interval.tv_usec = 0;
    +	v.it_interval.tv_sec = 0;
    +	v.it_interval.tv_usec = 250000;
     	v.it_value = v.it_interval;
     	setitimer(ITIMER_REAL, &v, NULL);
     }

I.e. start the timer after 1/4 of a second instead of 1 second, I get
that progress bar.

So I'm inclined to keep it. It just needs to be 4x the size before it's
noticeably hanging for 1 second.

That repo isn't all that big compared to what we've heard about out
there, and inner loops like this have a tendency to accumulate some more
code over time without a re-visit of why we weren't monitoring progress
there.

But maybe we can fix the message. We say "Annotating commits in commit
grap", not "Counting" or whatever. I was trying to find something that
didn't imply that we were doing this once. One can annotate a thing more
than once, but maybe ther's a better way to explain this...

We had some more accurate progress reporting in close_reachable(),
discussed in
https://public-inbox.org/git/87efe5qqks.fsf@evledraar.gmail.com/ I still
think the *main* use-case for these things is to just report that we're
not hanging, so maybe the proper solution is to pick up Duy's patch to
display a spinner insted of a numeric progress.

>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>
>>  		if (commit)
>>  			commit->object.flags &= ~UNINTERESTING;
>>  	}
>> +	stop_progress(&progress);
>>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
@ 2018-10-10 22:19           ` SZEDER Gábor
  2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 39+ messages in thread
From: SZEDER Gábor @ 2018-10-10 22:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Oct 10 2018, SZEDER Gábor wrote:

> >>  	for (i = 0; i < oids->nr; i++) {
> >> +		display_progress(progress, ++j);
> 
> [...]
> 
> > This display_progress() call, however, doesn't seem to be necessary.
> > First, it counts all commits for a second time, resulting in the ~2x
> > difference compared to the actual number of commits, and then causing
> > my confusion.  Second, all what this loop is doing is setting a flag
> > in commits that were already looked up and parsed in the above loops.
> > IOW this loop is very fast, and the progress indicator jumps from
> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> > progress indicator at all.
> 
> You're right, I tried this patch on top:

[...] 

> And on a large repo with around 3 million commits the 3rd progress bar
> doesn't kick in.
> 
> But if I apply this on top:
> 
[...]
> 
> I.e. start the timer after 1/4 of a second instead of 1 second, I get
> that progress bar.
> 
> So I'm inclined to keep it. It just needs to be 4x the size before it's
> noticeably hanging for 1 second.

Just to clarify: are you worried about a 1 second hang in an approx. 12
million commit repository?  If so, then I'm unconvinced, that's not
even a blip on the radar, and the misleading numbers are far worse.

> That repo isn't all that big compared to what we've heard about out
> there, and inner loops like this have a tendency to accumulate some more
> code over time without a re-visit of why we weren't monitoring progress
> there.
> 
> But maybe we can fix the message. We say "Annotating commits in commit
> grap", not "Counting" or whatever. I was trying to find something that
> didn't imply that we were doing this once. One can annotate a thing more
> than once, but maybe ther's a better way to explain this...

IMO just remove it.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 22:19           ` SZEDER Gábor
@ 2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
  2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-10 22:37 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>
>> >>  	for (i = 0; i < oids->nr; i++) {
>> >> +		display_progress(progress, ++j);
>>
>> [...]
>>
>> > This display_progress() call, however, doesn't seem to be necessary.
>> > First, it counts all commits for a second time, resulting in the ~2x
>> > difference compared to the actual number of commits, and then causing
>> > my confusion.  Second, all what this loop is doing is setting a flag
>> > in commits that were already looked up and parsed in the above loops.
>> > IOW this loop is very fast, and the progress indicator jumps from
>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>> > progress indicator at all.
>>
>> You're right, I tried this patch on top:
>
> [...]
>
>> And on a large repo with around 3 million commits the 3rd progress bar
>> doesn't kick in.
>>
>> But if I apply this on top:
>>
> [...]
>>
>> I.e. start the timer after 1/4 of a second instead of 1 second, I get
>> that progress bar.
>>
>> So I'm inclined to keep it. It just needs to be 4x the size before it's
>> noticeably hanging for 1 second.
>
> Just to clarify: are you worried about a 1 second hang in an approx. 12
> million commit repository?  If so, then I'm unconvinced, that's not
> even a blip on the radar, and the misleading numbers are far worse.

It's not a blip on the runtime, but the point of these progress bars in
general is so we don't have a UI where there's no UI differnce between
git hanging and just doing work in some tight loop in the background,
and even 1 second when you're watching something is noticeable if it
stalls.

Also it's 1 second on a server where I had 128G of RAM. I think even a
"trivial" flag change like this would very much change if e.g. the
system was under memory pressure or was swapping.

And as noted code like this tends to change over time, that loop might
get more expensive, so let's future proof by having all the loops over N
call the progress code.

When I wrote this the intent was just "report progress". So that it's
counting anything is just an implementation detail of how progress.c
works now.

This was the reference to Duy's patch, i.e. instead of spewing numbers
at the user here let's just render a spinner. Then we no longer need to
make judgement calls about which loop over N is expensive right now, and
which one isn't, and if any of them will result in reporting a 2N number
while the user might be more familiar with or expecting N.

>> That repo isn't all that big compared to what we've heard about out
>> there, and inner loops like this have a tendency to accumulate some more
>> code over time without a re-visit of why we weren't monitoring progress
>> there.
>>
>> But maybe we can fix the message. We say "Annotating commits in commit
>> grap", not "Counting" or whatever. I was trying to find something that
>> didn't imply that we were doing this once. One can annotate a thing more
>> than once, but maybe ther's a better way to explain this...
>
> IMO just remove it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
@ 2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
  2018-10-15 16:05                 ` SZEDER Gábor
  0 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-11 17:52 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>
>> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>>
>>> >>  	for (i = 0; i < oids->nr; i++) {
>>> >> +		display_progress(progress, ++j);
>>>
>>> [...]
>>>
>>> > This display_progress() call, however, doesn't seem to be necessary.
>>> > First, it counts all commits for a second time, resulting in the ~2x
>>> > difference compared to the actual number of commits, and then causing
>>> > my confusion.  Second, all what this loop is doing is setting a flag
>>> > in commits that were already looked up and parsed in the above loops.
>>> > IOW this loop is very fast, and the progress indicator jumps from
>>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>>> > progress indicator at all.
>>>
>>> You're right, I tried this patch on top:
>>
>> [...]
>>
>>> And on a large repo with around 3 million commits the 3rd progress bar
>>> doesn't kick in.
>>>
>>> But if I apply this on top:
>>>
>> [...]
>>>
>>> I.e. start the timer after 1/4 of a second instead of 1 second, I get
>>> that progress bar.
>>>
>>> So I'm inclined to keep it. It just needs to be 4x the size before it's
>>> noticeably hanging for 1 second.
>>
>> Just to clarify: are you worried about a 1 second hang in an approx. 12
>> million commit repository?  If so, then I'm unconvinced, that's not
>> even a blip on the radar, and the misleading numbers are far worse.
>
> It's not a blip on the runtime, but the point of these progress bars in
> general is so we don't have a UI where there's no UI differnce between
> git hanging and just doing work in some tight loop in the background,
> and even 1 second when you're watching something is noticeable if it
> stalls.
>
> Also it's 1 second on a server where I had 128G of RAM. I think even a
> "trivial" flag change like this would very much change if e.g. the
> system was under memory pressure or was swapping.
>
> And as noted code like this tends to change over time, that loop might
> get more expensive, so let's future proof by having all the loops over N
> call the progress code.
>
> When I wrote this the intent was just "report progress". So that it's
> counting anything is just an implementation detail of how progress.c
> works now.
>
> This was the reference to Duy's patch, i.e. instead of spewing numbers
> at the user here let's just render a spinner. Then we no longer need to
> make judgement calls about which loop over N is expensive right now, and
> which one isn't, and if any of them will result in reporting a 2N number
> while the user might be more familiar with or expecting N.
>
>>> That repo isn't all that big compared to what we've heard about out
>>> there, and inner loops like this have a tendency to accumulate some more
>>> code over time without a re-visit of why we weren't monitoring progress
>>> there.
>>>
>>> But maybe we can fix the message. We say "Annotating commits in commit
>>> grap", not "Counting" or whatever. I was trying to find something that
>>> didn't imply that we were doing this once. One can annotate a thing more
>>> than once, but maybe ther's a better way to explain this...
>>
>> IMO just remove it.

Hrm, actually reading this again your initial post says we end up with a
2x difference v.s. the number of commits, but it's actually 3x. The loop
that has a rather trivial runtime comparatively is the 3x, but the 2x
loop takes a notable amount of time. So e.g. on git.git:

    $ git rev-list --all | wc -l; ~/g/git/git commit-graph write
    166678
    Annotating commits in commit graph: 518463, done.
    Computing commit graph generation numbers: 100% (172685/172685), done.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-10 20:37       ` SZEDER Gábor
  2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
@ 2018-10-12  6:09         ` Junio C Hamano
  2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-10-12  6:09 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, git, Derrick Stolee,
	Jeff King, Eric Sunshine, Nguyễn Thái Ngọc Duy

SZEDER Gábor <szeder.dev@gmail.com> writes:

>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>  
>>  		if (commit && !parse_commit(commit))
>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>>  	}
>
> The above loops have already counted all the commits, and, more
> importantly, did all the hard work that takes time and makes the
> progress indicator useful.
>
>>  	for (i = 0; i < oids->nr; i++) {
>> +		display_progress(progress, ++j);
>
> This display_progress() call, however, doesn't seem to be necessary.
> First, it counts all commits for a second time, resulting in the ~2x
> difference compared to the actual number of commits, and then causing
> my confusion.  Second, all what this loop is doing is setting a flag
> in commits that were already looked up and parsed in the above loops.
> IOW this loop is very fast, and the progress indicator jumps from
> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> progress indicator at all.

Makes sense.  If this second iteration were also time consuming,
then it probably is a good idea to split these into two separate
phases?  "Counting 1...N" followed by "Inspecting 1...N" or
something like that.  Of course, if the latter does not take much
time, then doing without any progress indicator is also fine.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-12  6:09         ` Junio C Hamano
@ 2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
  2018-10-12 15:12             ` Derrick Stolee
  0 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-12 15:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy


On Fri, Oct 12 2018, Junio C Hamano wrote:

> SZEDER Gábor <szeder.dev@gmail.com> writes:
>
>>>  	for (i = 0; i < oids->nr; i++) {
>>> +		display_progress(progress, ++j);
>>>  		commit = lookup_commit(the_repository, &oids->list[i]);
>>>
>>>  		if (commit && !parse_commit(commit))
>>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list *oids)
>>>  	}
>>
>> The above loops have already counted all the commits, and, more
>> importantly, did all the hard work that takes time and makes the
>> progress indicator useful.
>>
>>>  	for (i = 0; i < oids->nr; i++) {
>>> +		display_progress(progress, ++j);
>>
>> This display_progress() call, however, doesn't seem to be necessary.
>> First, it counts all commits for a second time, resulting in the ~2x
>> difference compared to the actual number of commits, and then causing
>> my confusion.  Second, all what this loop is doing is setting a flag
>> in commits that were already looked up and parsed in the above loops.
>> IOW this loop is very fast, and the progress indicator jumps from
>> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>> progress indicator at all.
>
> Makes sense.  If this second iteration were also time consuming,
> then it probably is a good idea to split these into two separate
> phases?  "Counting 1...N" followed by "Inspecting 1...N" or
> something like that.  Of course, if the latter does not take much
> time, then doing without any progress indicator is also fine.

That's a good point. Derrick: If the three loops in close_reachable()
had to be split up into different progress stages and given different
names what do you think they should be? Now it's "Annotating commits in
commit graph" for all of them.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
@ 2018-10-12 15:12             ` Derrick Stolee
  0 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2018-10-12 15:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano
  Cc: SZEDER Gábor, git, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On 10/12/2018 11:07 AM, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Oct 12 2018, Junio C Hamano wrote:
>
>> Makes sense. If this second iteration were also time consuming,
>> then it probably is a good idea to split these into two separate
>> phases?  "Counting 1...N" followed by "Inspecting 1...N" or
>> something like that.  Of course, if the latter does not take much
>> time, then doing without any progress indicator is also fine.
> That's a good point. Derrick: If the three loops in close_reachable()
> had to be split up into different progress stages and given different
> names what do you think they should be? Now it's "Annotating commits in
> commit graph" for all of them.

The following is the best I can think of right now:

1. Loading known commits.
2. Expanding reachable commits.
3. Clearing commit marks.

-Stolee

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
@ 2018-10-15 16:05                 ` SZEDER Gábor
  0 siblings, 0 replies; 39+ messages in thread
From: SZEDER Gábor @ 2018-10-15 16:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Thu, Oct 11, 2018 at 07:52:21PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Wed, Oct 10 2018, SZEDER Gábor wrote:
> >
> >> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
> >>
> >>> >>  	for (i = 0; i < oids->nr; i++) {
> >>> >> +		display_progress(progress, ++j);
> >>>
> >>> [...]
> >>>
> >>> > This display_progress() call, however, doesn't seem to be necessary.
> >>> > First, it counts all commits for a second time, resulting in the ~2x
> >>> > difference compared to the actual number of commits, and then causing
> >>> > my confusion.  Second, all what this loop is doing is setting a flag
> >>> > in commits that were already looked up and parsed in the above loops.
> >>> > IOW this loop is very fast, and the progress indicator jumps from
> >>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> >>> > progress indicator at all.

> Hrm, actually reading this again your initial post says we end up with a
> 2x difference v.s. the number of commits, but it's actually 3x.

Well, it depends on how you create the commit-graph and on the repo as
well, I guess.  I run 'git commit-graph write --reachable' in a repo
created by 'git clone --single-branch ...', and in that case the
difference is only ~2x (the first loop in close_reachable() has as
many iterations as the number of refs).  If the repo were to contain
twice as many refs as commits, then the difference could be as high as
4x.

However, I think I might have noticed an other progress counting issue
as well, will get back to it later but first I have to get my numbers
straight.

> The loop
> that has a rather trivial runtime comparatively is the 3x, but the 2x
> loop takes a notable amount of time. So e.g. on git.git:
> 
>     $ git rev-list --all | wc -l; ~/g/git/git commit-graph write
>     166678
>     Annotating commits in commit graph: 518463, done.
>     Computing commit graph generation numbers: 100% (172685/172685), done.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 1/2] commit-graph write: add progress output
  2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
  2018-10-10 20:37       ` SZEDER Gábor
@ 2018-10-15 16:54       ` SZEDER Gábor
  1 sibling, 0 replies; 39+ messages in thread
From: SZEDER Gábor @ 2018-10-15 16:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King, Eric Sunshine,
	Nguyễn Thái Ngọc Duy

On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:

> @@ -560,6 +563,9 @@ static int add_packed_commits(const struct object_id *oid,
>  	off_t offset = nth_packed_object_offset(pack, pos);
>  	struct object_info oi = OBJECT_INFO_INIT;
>  
> +	if (list->progress)
> +		display_progress(list->progress, ++list->progress_done);

Note that add_packed_commits() is used as a callback function for
for_each_object_in_pack() (with '--stdin-packs') or
for_each_packed_object() (no options), i.e. this will count the number
of objects, not commits:

  $ git rev-list --all |wc -l
  768524
  $ git rev-list --objects --all |wc -l
  6130295
  # '--count --objects' together didn't work as expected.
  $ time ~/src/git/git commit-graph write
  Finding commits for commit graph: 6130295, done.
  Annotating commits in commit graph: 2305572, done.
  Computing commit graph generation numbers: 100% (768524/768524), done.

(Now I also see the 3x difference in the "Annotating commits" counter
that you mentioned.)

I see two options:

  - Provide a different title for this progress counter, e.g.
    "Scanning objects for c-g", or "Processing objects...", or
    something else that says "objects" instead of "commits".

  - Move this condition and display_progress() call to the end of the
    function, so it will only count commits, not any other objects.
    (As far as I understand both for_each_object_in_pack() and
    for_each_packed_object() iterate in pack .idx order, i.e. it's
    essentially random.  This means that commit objects should be
    distributed evenly among other kinds of objects, so we don't have
    to worry about the counter stalling for a long stretch of
    consecutive non-commit objects.  At least in theory.)




^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, back to index

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 20:27 [PATCH 0/2] commit-graph: add progress output Ævar Arnfjörð Bjarmason
2018-09-04 20:27 ` [PATCH 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
2018-09-04 21:16   ` Eric Sunshine
2018-09-04 22:07   ` Junio C Hamano
2018-09-05 11:58     ` Derrick Stolee
2018-09-05 12:07       ` Ævar Arnfjörð Bjarmason
2018-09-05 21:46       ` Junio C Hamano
2018-09-05 22:12         ` Derrick Stolee
2018-09-07 15:11       ` Ævar Arnfjörð Bjarmason
2018-09-07 15:23         ` Ævar Arnfjörð Bjarmason
2018-09-07 17:15           ` Jeff King
2018-09-07 17:25             ` Derrick Stolee
2018-09-05 12:06   ` Derrick Stolee
2018-09-07 12:40   ` Ævar Arnfjörð Bjarmason
2018-09-07 13:12     ` Derrick Stolee
2018-09-04 20:27 ` [PATCH 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
2018-09-04 22:10   ` Junio C Hamano
2018-09-05 12:07 ` [PATCH 0/2] commit-graph: " Derrick Stolee
2018-09-07 18:29 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2018-09-11 20:26   ` Junio C Hamano
2018-09-07 18:29 ` [PATCH v2 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
2018-09-21 20:01   ` Derrick Stolee
2018-09-21 21:43     ` Junio C Hamano
2018-09-21 21:57       ` Junio C Hamano
2018-09-07 18:29 ` [PATCH v2 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason
2018-09-16  6:55   ` Duy Nguyen
2018-09-17 15:33     ` [PATCH v3 0/2] commit-graph: " Ævar Arnfjörð Bjarmason
2018-09-17 15:33     ` [PATCH v3 1/2] commit-graph write: " Ævar Arnfjörð Bjarmason
2018-10-10 20:37       ` SZEDER Gábor
2018-10-10 21:56         ` Ævar Arnfjörð Bjarmason
2018-10-10 22:19           ` SZEDER Gábor
2018-10-10 22:37             ` Ævar Arnfjörð Bjarmason
2018-10-11 17:52               ` Ævar Arnfjörð Bjarmason
2018-10-15 16:05                 ` SZEDER Gábor
2018-10-12  6:09         ` Junio C Hamano
2018-10-12 15:07           ` Ævar Arnfjörð Bjarmason
2018-10-12 15:12             ` Derrick Stolee
2018-10-15 16:54       ` SZEDER Gábor
2018-09-17 15:33     ` [PATCH v3 2/2] commit-graph verify: " Ævar Arnfjörð Bjarmason

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox