git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH] revision: introduce prepare_revision_walk_extended()
@ 2017-12-16 12:12 René Scharfe
  2017-12-17 10:20 ` Martin Ågren
  2017-12-18 15:10 ` Jeff King
  0 siblings, 2 replies; 10+ messages in thread
From: René Scharfe @ 2017-12-16 12:12 UTC (permalink / raw)
  To: Git List; +Cc: Jeff King, Martin Ågren, Christian Couder, Junio C Hamano

prepare_revision_walk() allows callers to take ownership of the array of
pending objects by setting the rev_info flag "leak_pending" and copying
the object_array "pending".  They use it to clear commit marks after
setup is done.  This interface is brittle enough that it requires
extensive comments.

Provide an easier way by adding a function that can hand over the array
to a caller-supplied output parameter and converting all users of the
flag "leak_pending" to call prepare_revision_walk_extended() instead.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 Documentation/technical/api-revision-walking.txt |  6 ++++++
 bisect.c                                         | 17 +++++------------
 builtin/checkout.c                               |  9 +--------
 bundle.c                                         |  9 +--------
 revision.c                                       | 10 +++++++++-
 revision.h                                       | 14 ++------------
 6 files changed, 24 insertions(+), 41 deletions(-)

diff --git a/Documentation/technical/api-revision-walking.txt b/Documentation/technical/api-revision-walking.txt
index 55b878ade8..9dc573d2ec 100644
--- a/Documentation/technical/api-revision-walking.txt
+++ b/Documentation/technical/api-revision-walking.txt
@@ -50,6 +50,12 @@ function.
 	returns any error (non-zero return code) and if it does not, you can
 	start using get_revision() to do the iteration.
 
+`prepare_revision_walk_extended`::
+
+	Like prepare_revision_walk(), but allows callers to take ownership
+	of the array of pending objects by passing an object_array pointer
+	as the second parameter; passing NULL clears the array.
+
 `get_revision`::
 
 	Takes a pointer to a `rev_info` structure and iterates over it,
diff --git a/bisect.c b/bisect.c
index 0fca17c02b..a2af405d28 100644
--- a/bisect.c
+++ b/bisect.c
@@ -641,9 +641,10 @@ static void bisect_rev_setup(struct rev_info *revs, const char *prefix,
 	/* XXX leak rev_argv, as "revs" may still be pointing to it */
 }
 
-static void bisect_common(struct rev_info *revs)
+static void bisect_common(struct rev_info *revs,
+			  struct object_array *old_pending_ptr)
 {
-	if (prepare_revision_walk(revs))
+	if (prepare_revision_walk_extended(revs, old_pending_ptr))
 		die("revision walk setup failed");
 	if (revs->tree_objects)
 		mark_edges_uninteresting(revs, NULL);
@@ -825,15 +826,7 @@ static int check_ancestors(const char *prefix)
 	bisect_rev_setup(&revs, prefix, "^%s", "%s", 0);
 
 	/* Save pending objects, so they can be cleaned up later. */
-	pending_copy = revs.pending;
-	revs.leak_pending = 1;
-
-	/*
-	 * bisect_common calls prepare_revision_walk right away, which
-	 * (together with .leak_pending = 1) makes us the sole owner of
-	 * the list of pending objects.
-	 */
-	bisect_common(&revs);
+	bisect_common(&revs, &pending_copy);
 	res = (revs.commits != NULL);
 
 	/* Clean up objects used, as they will be reused. */
@@ -964,7 +957,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
 	bisect_rev_setup(&revs, prefix, "%s", "^%s", 1);
 	revs.limited = 1;
 
-	bisect_common(&revs);
+	bisect_common(&revs, NULL);
 
 	find_bisection(&revs.commits, &reaches, &all, !!skipped_revs.nr);
 	revs.commits = managed_skipped(revs.commits, &tried);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index e1e157d205..1f04f5d5e5 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -796,14 +796,7 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 	add_pending_oid(&revs, "HEAD", &new->object.oid, UNINTERESTING);
 
 	/* Save pending objects, so they can be cleaned up later. */
-	refs = revs.pending;
-	revs.leak_pending = 1;
-
-	/*
-	 * prepare_revision_walk (together with .leak_pending = 1) makes us
-	 * the sole owner of the list of pending objects.
-	 */
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk_extended(&revs, &refs))
 		die(_("internal error in revision walk"));
 	if (!(old->object.flags & UNINTERESTING))
 		suggest_reattach(old, &revs);
diff --git a/bundle.c b/bundle.c
index 93290962c9..6af6e38c40 100644
--- a/bundle.c
+++ b/bundle.c
@@ -158,14 +158,7 @@ int verify_bundle(struct bundle_header *header, int verbose)
 	setup_revisions(2, argv, &revs, NULL);
 
 	/* Save pending objects, so they can be cleaned up later. */
-	refs = revs.pending;
-	revs.leak_pending = 1;
-
-	/*
-	 * prepare_revision_walk (together with .leak_pending = 1) makes us
-	 * the sole owner of the list of pending objects.
-	 */
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk_extended(&revs, &refs))
 		die(_("revision walk setup failed"));
 
 	i = req_nr;
diff --git a/revision.c b/revision.c
index f6a3da5cd9..3a231451a4 100644
--- a/revision.c
+++ b/revision.c
@@ -2841,6 +2841,12 @@ void reset_revision_walk(void)
 }
 
 int prepare_revision_walk(struct rev_info *revs)
+{
+	return prepare_revision_walk_extended(revs, NULL);
+}
+
+int prepare_revision_walk_extended(struct rev_info *revs,
+				   struct object_array *old_pending_ptr)
 {
 	int i;
 	struct object_array old_pending;
@@ -2860,7 +2866,9 @@ int prepare_revision_walk(struct rev_info *revs)
 			}
 		}
 	}
-	if (!revs->leak_pending)
+	if (old_pending_ptr)
+		*old_pending_ptr = old_pending;
+	else
 		object_array_clear(&old_pending);
 
 	/* Signal whether we need per-parent treesame decoration */
diff --git a/revision.h b/revision.h
index 54761200ad..5d4b475334 100644
--- a/revision.h
+++ b/revision.h
@@ -150,18 +150,6 @@ struct rev_info {
 			date_mode_explicit:1,
 			preserve_subject:1;
 	unsigned int	disable_stdin:1;
-	/*
-	 * Set `leak_pending` to prevent `prepare_revision_walk()` from clearing
-	 * the array of pending objects (`pending`). It will still forget about
-	 * the array and its entries, so they really are leaked. This can be
-	 * useful if the `struct object_array` `pending` is copied before
-	 * calling `prepare_revision_walk()`. By setting `leak_pending`, you
-	 * effectively claim ownership of the old array, so you should most
-	 * likely call `object_array_clear(&pending_copy)` once you are done.
-	 * Observe that this is about ownership of the array and its entries,
-	 * not the commits referenced by those entries.
-	 */
-	unsigned int	leak_pending:1;
 	/* --show-linear-break */
 	unsigned int	track_linear:1,
 			track_first_time:1,
@@ -270,6 +258,8 @@ extern int handle_revision_arg(const char *arg, struct rev_info *revs,
 
 extern void reset_revision_walk(void);
 extern int prepare_revision_walk(struct rev_info *revs);
+extern int prepare_revision_walk_extended(struct rev_info *revs,
+					  struct object_array *old_pending_ptr);
 extern struct commit *get_revision(struct rev_info *revs);
 extern char *get_revision_mark(const struct rev_info *revs,
 			       const struct commit *commit);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-16 12:12 [PATCH] revision: introduce prepare_revision_walk_extended() René Scharfe
@ 2017-12-17 10:20 ` Martin Ågren
  2017-12-18 15:10 ` Jeff King
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Ågren @ 2017-12-17 10:20 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Jeff King, Christian Couder, Junio C Hamano

On 16 December 2017 at 13:12, René Scharfe <l.s.r@web.de> wrote:
> prepare_revision_walk() allows callers to take ownership of the array of
> pending objects by setting the rev_info flag "leak_pending" and copying
> the object_array "pending".  They use it to clear commit marks after
> setup is done.  This interface is brittle enough that it requires
> extensive comments.
>
> Provide an easier way by adding a function that can hand over the array
> to a caller-supplied output parameter and converting all users of the
> flag "leak_pending" to call prepare_revision_walk_extended() instead.
>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
>  Documentation/technical/api-revision-walking.txt |  6 ++++++
>  bisect.c                                         | 17 +++++------------
>  builtin/checkout.c                               |  9 +--------
>  bundle.c                                         |  9 +--------
>  revision.c                                       | 10 +++++++++-
>  revision.h                                       | 14 ++------------
>  6 files changed, 24 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/technical/api-revision-walking.txt b/Documentation/technical/api-revision-walking.txt
> index 55b878ade8..9dc573d2ec 100644
> --- a/Documentation/technical/api-revision-walking.txt
> +++ b/Documentation/technical/api-revision-walking.txt
> @@ -50,6 +50,12 @@ function.
>         returns any error (non-zero return code) and if it does not, you can
>         start using get_revision() to do the iteration.
>
> +`prepare_revision_walk_extended`::
> +
> +       Like prepare_revision_walk(), but allows callers to take ownership
> +       of the array of pending objects by passing an object_array pointer
> +       as the second parameter; passing NULL clears the array.

This might make someone wonder what the difference between passing NULL
and using `prepare_revision_walk()` is. Perhaps: "passing NULL clears
the array, just as prepare_revision_walk() would." Possibly only matters
once we gain more parameters, and maybe not even then...

The name of your new function ("..._extended") doesn't describe the
nature of the extended behavior and made me wonder if it was too
generic. But that genericness might be a good thing. When/If we need to
tweak the behavior along some other axis, we can add a third parameter
to ..._extended and pass NULL/0 as appropriate. The simple cases will
stay simple and we won't gain lots of functions with minor differences.

> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index e1e157d205..1f04f5d5e5 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -796,14 +796,7 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
>         add_pending_oid(&revs, "HEAD", &new->object.oid, UNINTERESTING);
>
>         /* Save pending objects, so they can be cleaned up later. */
> -       refs = revs.pending;
> -       revs.leak_pending = 1;
> -
> -       /*
> -        * prepare_revision_walk (together with .leak_pending = 1) makes us
> -        * the sole owner of the list of pending objects.
> -        */
> -       if (prepare_revision_walk(&revs))
> +       if (prepare_revision_walk_extended(&revs, &refs))
>                 die(_("internal error in revision walk"));
>         if (!(old->object.flags & UNINTERESTING))
>                 suggest_reattach(old, &revs);
> diff --git a/bundle.c b/bundle.c
> index 93290962c9..6af6e38c40 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -158,14 +158,7 @@ int verify_bundle(struct bundle_header *header, int verbose)
>         setup_revisions(2, argv, &revs, NULL);
>
>         /* Save pending objects, so they can be cleaned up later. */
> -       refs = revs.pending;
> -       revs.leak_pending = 1;
> -
> -       /*
> -        * prepare_revision_walk (together with .leak_pending = 1) makes us
> -        * the sole owner of the list of pending objects.
> -        */
> -       if (prepare_revision_walk(&revs))
> +       if (prepare_revision_walk_extended(&revs, &refs))
>                 die(_("revision walk setup failed"));
>
>         i = req_nr;

This copy-paste coding that you get rid of here can be attributed to me.
I obviously like your cleaned-up version much better.

> diff --git a/revision.h b/revision.h
> index 54761200ad..5d4b475334 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -150,18 +150,6 @@ struct rev_info {
>                         date_mode_explicit:1,
>                         preserve_subject:1;
>         unsigned int    disable_stdin:1;
> -       /*
> -        * Set `leak_pending` to prevent `prepare_revision_walk()` from clearing
> -        * the array of pending objects (`pending`). It will still forget about
> -        * the array and its entries, so they really are leaked. This can be
> -        * useful if the `struct object_array` `pending` is copied before
> -        * calling `prepare_revision_walk()`. By setting `leak_pending`, you
> -        * effectively claim ownership of the old array, so you should most
> -        * likely call `object_array_clear(&pending_copy)` once you are done.
> -        * Observe that this is about ownership of the array and its entries,
> -        * not the commits referenced by those entries.
> -        */
> -       unsigned int    leak_pending:1;
>         /* --show-linear-break */
>         unsigned int    track_linear:1,
>                         track_first_time:1,

The commit message doesn't mention that you drop `leak_pending`, but
maybe that's obvious enough since you convert all users.

Thanks for tidying up,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-16 12:12 [PATCH] revision: introduce prepare_revision_walk_extended() René Scharfe
  2017-12-17 10:20 ` Martin Ågren
@ 2017-12-18 15:10 ` Jeff King
  2017-12-18 19:18   ` René Scharfe
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff King @ 2017-12-18 15:10 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Martin Ågren, Christian Couder, Junio C Hamano

On Sat, Dec 16, 2017 at 01:12:16PM +0100, René Scharfe wrote:

> prepare_revision_walk() allows callers to take ownership of the array of
> pending objects by setting the rev_info flag "leak_pending" and copying
> the object_array "pending".  They use it to clear commit marks after
> setup is done.  This interface is brittle enough that it requires
> extensive comments.
> 
> Provide an easier way by adding a function that can hand over the array
> to a caller-supplied output parameter and converting all users of the
> flag "leak_pending" to call prepare_revision_walk_extended() instead.

I think this is _better_, but it's still kind of a funny interface.

The root of the matter is that the revision-walking code doesn't clean
up after itself. In every case, the caller is just saving these to clean
up commit marks, isn't it?

Could we instead have an interface like:

  revs.clear_commit_marks = 1;
  prepare_revision_walk(&revs);
  ...
  finish_revision_walk(&revs);

where that final function would do any cleanup, including clearing the
commit marks. I suspect there are other small bits that get leaked
because there's not really any "destructor" for a revision walk.

It's not as flexible as this whole "make a copy of the pending tips"
thing, but it keeps all of the details abstracted away from the callers.

Alternatively:

> +`prepare_revision_walk_extended`::
> +
> +	Like prepare_revision_walk(), but allows callers to take ownership
> +	of the array of pending objects by passing an object_array pointer
> +	as the second parameter; passing NULL clears the array.

What if we just got rid of this function and had callers do:

  object_array_copy(&old_pending, &revs);
  prepare_revision_walk(&revs);
  ...
  clear_commit_marks_for_object_array(&old_pending);

That sidesteps all of the memory ownership issues by just creating a
copy. That's less efficient, but I'd be surprised if it matters in
practice (we tend to do one or two revisions per process, there don't
tend to be a lot of pending tips, and we're really just talking about
copying some pointers here).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-18 15:10 ` Jeff King
@ 2017-12-18 19:18   ` René Scharfe
  2017-12-19 11:49     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2017-12-18 19:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Martin Ågren, Christian Couder, Junio C Hamano

Am 18.12.2017 um 16:10 schrieb Jeff King:
> On Sat, Dec 16, 2017 at 01:12:16PM +0100, René Scharfe wrote:
> 
>> prepare_revision_walk() allows callers to take ownership of the array of
>> pending objects by setting the rev_info flag "leak_pending" and copying
>> the object_array "pending".  They use it to clear commit marks after
>> setup is done.  This interface is brittle enough that it requires
>> extensive comments.
>>
>> Provide an easier way by adding a function that can hand over the array
>> to a caller-supplied output parameter and converting all users of the
>> flag "leak_pending" to call prepare_revision_walk_extended() instead.
> 
> I think this is _better_, but it's still kind of a funny interface.
> 
> The root of the matter is that the revision-walking code doesn't clean
> up after itself. In every case, the caller is just saving these to clean
> up commit marks, isn't it?

bundle also checks if the pending objects exists.

> Could we instead have an interface like:
> 
>    revs.clear_commit_marks = 1;
>    prepare_revision_walk(&revs);
>    ...
>    finish_revision_walk(&revs);
> 
> where that final function would do any cleanup, including clearing the
> commit marks. I suspect there are other small bits that get leaked
> because there's not really any "destructor" for a revision walk.
> 
> It's not as flexible as this whole "make a copy of the pending tips"
> thing, but it keeps all of the details abstracted away from the callers.
> 
> Alternatively:
> 
>> +`prepare_revision_walk_extended`::
>> +
>> +	Like prepare_revision_walk(), but allows callers to take ownership
>> +	of the array of pending objects by passing an object_array pointer
>> +	as the second parameter; passing NULL clears the array.
> 
> What if we just got rid of this function and had callers do:
> 
>    object_array_copy(&old_pending, &revs);
>    prepare_revision_walk(&revs);
>    ...
>    clear_commit_marks_for_object_array(&old_pending);
> 
> That sidesteps all of the memory ownership issues by just creating a
> copy. That's less efficient, but I'd be surprised if it matters in
> practice (we tend to do one or two revisions per process, there don't
> tend to be a lot of pending tips, and we're really just talking about
> copying some pointers here).

This was done before I added the leak_pending flag.

t5502 and t5571 have test cases with ca. 1000 pending objects, t5551
and t5541 with ca. 2000, p5310 more than 8000.  That's just a few KB.

I don't know if there can be real-world use cases with millions of
entries (when it would start to hurt).

Why does prepare_revision_walk() clear the list of pending objects at
all?  Assuming the list is append-only then perhaps remembering the
last handled index would suffice.

René

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-18 19:18   ` René Scharfe
@ 2017-12-19 11:49     ` Jeff King
  2017-12-19 18:33       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2017-12-19 11:49 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Martin Ågren, Christian Couder, Junio C Hamano

On Mon, Dec 18, 2017 at 08:18:19PM +0100, René Scharfe wrote:

> > The root of the matter is that the revision-walking code doesn't clean
> > up after itself. In every case, the caller is just saving these to clean
> > up commit marks, isn't it?
> 
> bundle also checks if the pending objects exists.

Thanks, I missed that one. So just adding a feature to clean up commit
marks wouldn't be sufficient to cover that case.

> > That sidesteps all of the memory ownership issues by just creating a
> > copy. That's less efficient, but I'd be surprised if it matters in
> > practice (we tend to do one or two revisions per process, there don't
> > tend to be a lot of pending tips, and we're really just talking about
> > copying some pointers here).
> [...]
> I don't know if there can be real-world use cases with millions of
> entries (when it would start to hurt).

I've seen repos which have tens of thousands of tags. Something like
"rev-list --all" would have tens of thousands of pending objects.
I think in practice it's limited to the number of objects (though in
practice more like the number of commits).

I'd note also that for most uses we don't need a full object_array. You
really just need a pointer to the "struct object" to wipe its flags.

So there we might waste 8 bytes per object in the worst case. But bear
in mind that the process is wasting a lot more than that per "struct
commit" that we're holding. And versus the existing scheme, it's only
for the moment until prepare_revision_walk() frees the old pending list.

> Why does prepare_revision_walk() clear the list of pending objects at
> all?  Assuming the list is append-only then perhaps remembering the
> last handled index would suffice.

I assume it was mostly to clean up after itself, since there's no
explicit "I'm done with the traversal" function. But as I said earlier,
I'd be surprised of a revision walk doesn't leave some allocated cruft
in rev_info these days (e.g., pathspec cruft). In practice it doesn't
matter much because we don't do arbitrary numbers of traversals in
single process.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-19 11:49     ` Jeff King
@ 2017-12-19 18:33       ` Junio C Hamano
  2017-12-20 13:08         ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2017-12-19 18:33 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, Git List, Martin Ågren, Christian Couder

Jeff King <peff@peff.net> writes:

> On Mon, Dec 18, 2017 at 08:18:19PM +0100, René Scharfe wrote:
>
>> > The root of the matter is that the revision-walking code doesn't clean
>> > up after itself. In every case, the caller is just saving these to clean
>> > up commit marks, isn't it?
>> 
>> bundle also checks if the pending objects exists.
>
> Thanks, I missed that one. So just adding a feature to clean up commit
> marks wouldn't be sufficient to cover that case.

OK.

>> > That sidesteps all of the memory ownership issues by just creating a
>> > copy. That's less efficient, but I'd be surprised if it matters in
>> > practice (we tend to do one or two revisions per process, there don't
>> > tend to be a lot of pending tips, and we're really just talking about
>> > copying some pointers here).
>> [...]
>> I don't know if there can be real-world use cases with millions of
>> entries (when it would start to hurt).
>
> I've seen repos which have tens of thousands of tags. Something like
> "rev-list --all" would have tens of thousands of pending objects.
> I think in practice it's limited to the number of objects (though in
> practice more like the number of commits).
> ...

OK again; that is an argument against "let's copy the array".

>> Why does prepare_revision_walk() clear the list of pending objects at
>> all?  Assuming the list is append-only then perhaps remembering the
>> last handled index would suffice.

For that matter, why does it copy, instead of using revs->pending
directly?  I do not think I can answer this, as I think the design
decisions led to this code predates me.

In any case, it seems that the discussion has veered into an
interesting tangent but at this point it seems to me that it is not
likely to produce an immediate improvement over the posted patch.

Should we take the patch posted as-is and move forward?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-19 18:33       ` Junio C Hamano
@ 2017-12-20 13:08         ` Jeff King
  2017-12-21 18:41           ` René Scharfe
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2017-12-20 13:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Git List, Martin Ågren, Christian Couder

On Tue, Dec 19, 2017 at 10:33:55AM -0800, Junio C Hamano wrote:

> >> Why does prepare_revision_walk() clear the list of pending objects at
> >> all?  Assuming the list is append-only then perhaps remembering the
> >> last handled index would suffice.
> 
> For that matter, why does it copy, instead of using revs->pending
> directly?  I do not think I can answer this, as I think the design
> decisions led to this code predates me.

Me too, then. :) I think part of that copy is that we're moving the
items over to revs->commits, and dropping any non-commit objects.

> In any case, it seems that the discussion has veered into an
> interesting tangent but at this point it seems to me that it is not
> likely to produce an immediate improvement over the posted patch.

Fair enough.

> Should we take the patch posted as-is and move forward?

I suppose so.  I don't really have anything against the patch. My main
complaint was just that I don't think it's actually cleaning up the
gross parts of the interface. It's just substituting one gross thing (a
funny struct flag) for another (a special version of the prepare
function that takes a funny out-parameter).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-20 13:08         ` Jeff King
@ 2017-12-21 18:41           ` René Scharfe
  2017-12-24 14:22             ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2017-12-21 18:41 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Git List, Martin Ågren, Christian Couder

Am 20.12.2017 um 14:08 schrieb Jeff King:
> On Tue, Dec 19, 2017 at 10:33:55AM -0800, Junio C Hamano wrote:
> 
>> Should we take the patch posted as-is and move forward?
> 
> I suppose so.  I don't really have anything against the patch. My main
> complaint was just that I don't think it's actually cleaning up the
> gross parts of the interface. It's just substituting one gross thing (a
> funny struct flag) for another (a special version of the prepare
> function that takes a funny out-parameter).

If this is a fair description (and I have to admit that it is) then I
don't understand why you aren't against the patch.  Let's drop it.

Can we do better?  How about something this?  It reverts bundle to
duplicate the object_array, but creates just a commit_list in the other
two cases.  As a side-effect this allows clearing flags for all entries
with a single traversal.

---
 bisect.c           | 18 +++++++-----------
 builtin/checkout.c | 13 +++----------
 bundle.c           | 12 +++++-------
 commit.c           | 18 ++++++++++++++++--
 commit.h           |  3 +++
 revision.c         |  3 +--
 revision.h         | 12 ------------
 7 files changed, 35 insertions(+), 44 deletions(-)

diff --git a/bisect.c b/bisect.c
index 0fca17c02b..c966220738 100644
--- a/bisect.c
+++ b/bisect.c
@@ -819,27 +819,23 @@ static void check_merge_bases(int no_checkout)
 static int check_ancestors(const char *prefix)
 {
 	struct rev_info revs;
-	struct object_array pending_copy;
+	struct commit_list *to_clear = NULL;
 	int res;
 
 	bisect_rev_setup(&revs, prefix, "^%s", "%s", 0);
 
-	/* Save pending objects, so they can be cleaned up later. */
-	pending_copy = revs.pending;
-	revs.leak_pending = 1;
-
 	/*
-	 * bisect_common calls prepare_revision_walk right away, which
-	 * (together with .leak_pending = 1) makes us the sole owner of
-	 * the list of pending objects.
+	 * bisect_common() calls prepare_revision_walk() right away,
+	 * which (among other things) clears revs.pending.  Save the
+	 * pending commits so that we can up their marks later.
 	 */
+	commit_list_insert_from_object_array(&revs.pending, &to_clear);
+
 	bisect_common(&revs);
 	res = (revs.commits != NULL);
 
 	/* Clean up objects used, as they will be reused. */
-	clear_commit_marks_for_object_array(&pending_copy, ALL_REV_FLAGS);
-
-	object_array_clear(&pending_copy);
+	clear_and_free_commit_list(to_clear, ALL_REV_FLAGS);
 
 	return res;
 }
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f9f3797e11..0a694ae14a 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -791,7 +791,7 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 {
 	struct rev_info revs;
 	struct object *object = &old->object;
-	struct object_array refs;
+	struct commit_list *to_clear = NULL;
 
 	init_revisions(&revs, NULL);
 	setup_revisions(0, NULL, &revs, NULL);
@@ -803,13 +803,8 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 	add_pending_oid(&revs, "HEAD", &new->object.oid, UNINTERESTING);
 
 	/* Save pending objects, so they can be cleaned up later. */
-	refs = revs.pending;
-	revs.leak_pending = 1;
+	commit_list_insert_from_object_array(&revs.pending, &to_clear);
 
-	/*
-	 * prepare_revision_walk (together with .leak_pending = 1) makes us
-	 * the sole owner of the list of pending objects.
-	 */
 	if (prepare_revision_walk(&revs))
 		die(_("internal error in revision walk"));
 	if (!(old->object.flags & UNINTERESTING))
@@ -818,9 +813,7 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 		describe_detached_head(_("Previous HEAD position was"), old);
 
 	/* Clean up objects used, as they will be reused. */
-	clear_commit_marks_for_object_array(&refs, ALL_REV_FLAGS);
-
-	object_array_clear(&refs);
+	clear_and_free_commit_list(to_clear, ALL_REV_FLAGS);
 }
 
 static int switch_branches(const struct checkout_opts *opts,
diff --git a/bundle.c b/bundle.c
index 93290962c9..6916296834 100644
--- a/bundle.c
+++ b/bundle.c
@@ -134,7 +134,7 @@ int verify_bundle(struct bundle_header *header, int verbose)
 	struct ref_list *p = &header->prerequisites;
 	struct rev_info revs;
 	const char *argv[] = {NULL, "--all", NULL};
-	struct object_array refs;
+	struct object_array refs = OBJECT_ARRAY_INIT;
 	struct commit *commit;
 	int i, ret = 0, req_nr;
 	const char *message = _("Repository lacks these prerequisite commits:");
@@ -158,13 +158,11 @@ int verify_bundle(struct bundle_header *header, int verbose)
 	setup_revisions(2, argv, &revs, NULL);
 
 	/* Save pending objects, so they can be cleaned up later. */
-	refs = revs.pending;
-	revs.leak_pending = 1;
+	for (i = 0; i < revs.pending.nr; i++) {
+		struct object_array_entry *e = revs.pending.objects + i;
+		add_object_array(e->item, e->name, &refs);
+	}
 
-	/*
-	 * prepare_revision_walk (together with .leak_pending = 1) makes us
-	 * the sole owner of the list of pending objects.
-	 */
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
diff --git a/commit.c b/commit.c
index cab8d4455b..3b96277879 100644
--- a/commit.c
+++ b/commit.c
@@ -550,6 +550,11 @@ void clear_commit_marks_many(int nr, struct commit **commit, unsigned int mark)
 		commit_list_insert(*commit, &list);
 		commit++;
 	}
+	clear_and_free_commit_list(list, mark);
+}
+
+void clear_and_free_commit_list(struct commit_list *list, unsigned int mark)
+{
 	while (list)
 		clear_commit_marks_1(&list, pop_commit(&list), mark);
 }
@@ -559,7 +564,8 @@ void clear_commit_marks(struct commit *commit, unsigned int mark)
 	clear_commit_marks_many(1, &commit, mark);
 }
 
-void clear_commit_marks_for_object_array(struct object_array *a, unsigned mark)
+void commit_list_insert_from_object_array(const struct object_array *a,
+					  struct commit_list **list)
 {
 	struct object *object;
 	struct commit *commit;
@@ -569,10 +575,18 @@ void clear_commit_marks_for_object_array(struct object_array *a, unsigned mark)
 		object = a->objects[i].item;
 		commit = lookup_commit_reference_gently(&object->oid, 1);
 		if (commit)
-			clear_commit_marks(commit, mark);
+			commit_list_insert(commit, list);
 	}
 }
 
+void clear_commit_marks_for_object_array(struct object_array *a, unsigned mark)
+{
+	struct commit_list *list = NULL;
+
+	commit_list_insert_from_object_array(a, &list);
+	clear_and_free_commit_list(list, mark);
+}
+
 struct commit *pop_commit(struct commit_list **stack)
 {
 	struct commit_list *top = *stack;
diff --git a/commit.h b/commit.h
index 99a3fea68d..128a52f264 100644
--- a/commit.h
+++ b/commit.h
@@ -115,6 +115,8 @@ unsigned commit_list_count(const struct commit_list *l);
 struct commit_list *commit_list_insert_by_date(struct commit *item,
 				    struct commit_list **list);
 void commit_list_sort_by_date(struct commit_list **list);
+void commit_list_insert_from_object_array(const struct object_array *a,
+					  struct commit_list **list);
 
 /* Shallow copy of the input list */
 struct commit_list *copy_commit_list(struct commit_list *list);
@@ -220,6 +222,7 @@ struct commit *pop_commit(struct commit_list **stack);
 void clear_commit_marks(struct commit *commit, unsigned int mark);
 void clear_commit_marks_many(int nr, struct commit **commit, unsigned int mark);
 void clear_commit_marks_for_object_array(struct object_array *a, unsigned mark);
+void clear_and_free_commit_list(struct commit_list *list, unsigned int mark);
 
 
 enum rev_sort_order {
diff --git a/revision.c b/revision.c
index f6a3da5cd9..7239315de9 100644
--- a/revision.c
+++ b/revision.c
@@ -2860,8 +2860,7 @@ int prepare_revision_walk(struct rev_info *revs)
 			}
 		}
 	}
-	if (!revs->leak_pending)
-		object_array_clear(&old_pending);
+	object_array_clear(&old_pending);
 
 	/* Signal whether we need per-parent treesame decoration */
 	if (revs->simplify_merges ||
diff --git a/revision.h b/revision.h
index 54761200ad..27be70e75c 100644
--- a/revision.h
+++ b/revision.h
@@ -150,18 +150,6 @@ struct rev_info {
 			date_mode_explicit:1,
 			preserve_subject:1;
 	unsigned int	disable_stdin:1;
-	/*
-	 * Set `leak_pending` to prevent `prepare_revision_walk()` from clearing
-	 * the array of pending objects (`pending`). It will still forget about
-	 * the array and its entries, so they really are leaked. This can be
-	 * useful if the `struct object_array` `pending` is copied before
-	 * calling `prepare_revision_walk()`. By setting `leak_pending`, you
-	 * effectively claim ownership of the old array, so you should most
-	 * likely call `object_array_clear(&pending_copy)` once you are done.
-	 * Observe that this is about ownership of the array and its entries,
-	 * not the commits referenced by those entries.
-	 */
-	unsigned int	leak_pending:1;
 	/* --show-linear-break */
 	unsigned int	track_linear:1,
 			track_first_time:1,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-21 18:41           ` René Scharfe
@ 2017-12-24 14:22             ` Jeff King
  2017-12-25 17:36               ` René Scharfe
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2017-12-24 14:22 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Git List, Martin Ågren, Christian Couder

On Thu, Dec 21, 2017 at 07:41:44PM +0100, René Scharfe wrote:

> Am 20.12.2017 um 14:08 schrieb Jeff King:
> > On Tue, Dec 19, 2017 at 10:33:55AM -0800, Junio C Hamano wrote:
> > 
> >> Should we take the patch posted as-is and move forward?
> > 
> > I suppose so.  I don't really have anything against the patch. My main
> > complaint was just that I don't think it's actually cleaning up the
> > gross parts of the interface. It's just substituting one gross thing (a
> > funny struct flag) for another (a special version of the prepare
> > function that takes a funny out-parameter).
> 
> If this is a fair description (and I have to admit that it is) then I
> don't understand why you aren't against the patch.  Let's drop it.

Heh, I almost wrote more on this. My thinking was two-fold:

  - while I think the fundamental gross-ness is still there after your
    patch, it does smooth some of the rough edges. So it's a slight
    improvement over the status quo.

  - I read an article a while back (which unfortunately I can't find
    now) about the idea of "default to yes" in open source. I.e., the
    idea that if somebody bothered to cook up a patch and there is no
    good reason to reject it, you should take it.

    Certainly there are cases where that can go too far: obvious ones
    like bad ideas where it is not really "default" anymore, but also
    subtle ones where the changes are code churn whose fallouts will
    be dealt with by others. But this patch is a good example. I'm not
    convinced it makes things better, but I don't think it makes things
    worse. If you, who looked a lot closer at the problem than I did,
    still think it's a good idea after our discussion, it's probably
    worth applying.

So my comments weren't really "this is a bad idea" as much as "is there
a better idea". We didn't come up with one after some discussion, and
I'm willing to let it go there for now.

But if you want to keep thinking on it, I'm game. ;)

> Can we do better?  How about something this?  It reverts bundle to
> duplicate the object_array, but creates just a commit_list in the other
> two cases.  As a side-effect this allows clearing flags for all entries
> with a single traversal.

I think this is basically the "copy it before-hand" thing from earlier
in the thread. Switching to just keeping commits seems like a nice
change. It's an easy (if minor) optimization, and it makes clear exactly
which part of the data we need to keep around.

The single-traversal thing I suspect doesn't matter much in practice. In
both cases if we would visit commit X twice, we'd immediately see on the
second visit that it has already been cleared and not do anymore work.

  Side note: Another question is whether it would simply be faster to
  clear the flags for _all_ objects that we've touched in the current
  process (we have clear_object_flags() for this already). Then we know
  that we touch each one once, and we as a bonus we don't even have to
  keep the previous tips. The downsides are:

    - if another traversal in the process looked at many objects, but
      our current traversal looked at few, then we would examine more
      objects than we need to (albeit with lower cost per object)

    - it's possible there's another traversal in the same process whose
      flags we would want to have saved. I suspect such a setup is
      broken already, though, unless there's a guarantee that the two
      traversals don't overlap.

So anyway, I think this is a reasonable approach, unless we're really
worried about the extra O(# of pending) allocation.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] revision: introduce prepare_revision_walk_extended()
  2017-12-24 14:22             ` Jeff King
@ 2017-12-25 17:36               ` René Scharfe
  0 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2017-12-25 17:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List, Martin Ågren, Christian Couder

Am 24.12.2017 um 15:22 schrieb Jeff King:
> The single-traversal thing I suspect doesn't matter much in practice. In
> both cases if we would visit commit X twice, we'd immediately see on the
> second visit that it has already been cleared and not do anymore work.

Good point.  That makes clear_commit_marks_many() less useful than
advertised in e895cb5135, though.

>    Side note: Another question is whether it would simply be faster to
>    clear the flags for _all_ objects that we've touched in the current
>    process (we have clear_object_flags() for this already). Then we know
>    that we touch each one once, and we as a bonus we don't even have to
>    keep the previous tips. The downsides are:
> 
>      - if another traversal in the process looked at many objects, but
>        our current traversal looked at few, then we would examine more
>        objects than we need to (albeit with lower cost per object)
> 
>      - it's possible there's another traversal in the same process whose
>        flags we would want to have saved. I suspect such a setup is
>        broken already, though, unless there's a guarantee that the two
>        traversals don't overlap.

I thought about that nuclear option as well.  It might be a good idea
for code in cmd_* and similar leaf functions for cleaning up between
unrelated stages (e.g. between parts that had been separate external
git command calls before).  They probably only load potentially
interesting objects into memory and don't need to bother much about
interactions with other functions.

But clear_object_flags() makes me nervous because it clears the flags
of all kinds of objects, not just for commits, and I can't easily
convince myself that this is safe.  Adding a version that checks the
object type would be an easy way out.

René

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-16 12:12 [PATCH] revision: introduce prepare_revision_walk_extended() René Scharfe
2017-12-17 10:20 ` Martin Ågren
2017-12-18 15:10 ` Jeff King
2017-12-18 19:18   ` René Scharfe
2017-12-19 11:49     ` Jeff King
2017-12-19 18:33       ` Junio C Hamano
2017-12-20 13:08         ` Jeff King
2017-12-21 18:41           ` René Scharfe
2017-12-24 14:22             ` Jeff King
2017-12-25 17:36               ` René Scharfe

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox