git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] delta-islands: free island_marks and bitmaps
@ 2023-02-02  1:03 Eric Wong
  2023-02-02  1:45 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2023-02-02  1:03 UTC (permalink / raw)
  To: git

On my mirror of linux.git forkgroup with 780 islands, this saves
nearly 4G of heap memory in pack-objects.  This savings only
benefits delta island users of pack bitmaps, as the process
would otherwise be exiting anyways.

However, there's probably not many delta island users, but the
majority of delta island users would also be pack bitmaps users.

Signed-off-by: Eric Wong <e@80x24.org>
---
 Slowly chipping away, this is low-hanging fruit.

 delta-islands.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/delta-islands.c b/delta-islands.c
index 90c0d6958f..7052a3a6ba 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -513,12 +513,29 @@ void propagate_island_marks(struct commit *commit)
 	}
 }
 
+static void free_island_marks(void)
+{
+	struct island_bitmap *bitmap;
+
+	kh_foreach_value(island_marks, bitmap, {
+		if (--bitmap->refcount == 0)
+			free(bitmap);
+	});
+	kh_destroy_oid_map(island_marks);
+	island_marks = (void *)1; /* crash on unintended future use */
+}
+
 int compute_pack_layers(struct packing_data *to_pack)
 {
 	uint32_t i;
 
-	if (!core_island_name || !island_marks)
+	if (!island_marks)
+		return 1;
+
+	if (!core_island_name) {
+		free_island_marks();
 		return 1;
+	}
 
 	for (i = 0; i < to_pack->nr_objects; ++i) {
 		struct object_entry *entry = &to_pack->objects[i];
@@ -533,6 +550,7 @@ int compute_pack_layers(struct packing_data *to_pack)
 				oe_set_layer(to_pack, entry, 0);
 		}
 	}
+	free_island_marks();
 
 	return 2;
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] delta-islands: free island_marks and bitmaps
  2023-02-02  1:03 [PATCH] delta-islands: free island_marks and bitmaps Eric Wong
@ 2023-02-02  1:45 ` Ævar Arnfjörð Bjarmason
  2023-02-02  9:42   ` [PATCH v2] " Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-02-02  1:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: git


On Thu, Feb 02 2023, Eric Wong wrote:

> +	kh_foreach_value(island_marks, bitmap, {
> +		if (--bitmap->refcount == 0)

Style nit: if (!--x) rather than if (--x == 0) ?

> +			free(bitmap);
> +	});
> +	kh_destroy_oid_map(island_marks);
> +	island_marks = (void *)1; /* crash on unintended future use */

This seems counter-productive. If you just leave it free'd then various
analysis tools will spot a use-after-free earlier, won't they?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] delta-islands: free island_marks and bitmaps
  2023-02-02  1:45 ` Ævar Arnfjörð Bjarmason
@ 2023-02-02  9:42   ` Eric Wong
  2023-02-03 18:11     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2023-02-02  9:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, Feb 02 2023, Eric Wong wrote:
> 
> > +	kh_foreach_value(island_marks, bitmap, {
> > +		if (--bitmap->refcount == 0)
> 
> Style nit: if (!--x) rather than if (--x == 0) ?

Oops, fixed in v2

> > +			free(bitmap);
> > +	});
> > +	kh_destroy_oid_map(island_marks);
> > +	island_marks = (void *)1; /* crash on unintended future use */
> 
> This seems counter-productive. If you just leave it free'd then various
> analysis tools will spot a use-after-free earlier, won't they?

*shrug*  I thought it might be better to use an explicitly bad
pointer to catch lifetime issues that I might've missed from
reading the code.  Since I've run and tested at this point,
it probably doesn't matter, now.  So I'll just omit the
assignment and save some icache footprint.

Thanks for the review!

--------8<-------
Subject: [PATCH] delta-islands: free island_marks and bitmaps

On my mirror of linux.git forkgroup with 780 islands, this saves
nearly 4G of heap memory in pack-objects.  This savings only
benefits delta island users of pack bitmaps, as the process
would otherwise be exiting anyways.

However, there's probably not many delta island users, but the
majority of delta island users would also be pack bitmaps users.

Signed-off-by: Eric Wong <e@80x24.org>
---
 delta-islands.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/delta-islands.c b/delta-islands.c
index 90c0d6958f..c09dab31a4 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -513,12 +513,28 @@ void propagate_island_marks(struct commit *commit)
 	}
 }
 
+static void free_island_marks(void)
+{
+	struct island_bitmap *bitmap;
+
+	kh_foreach_value(island_marks, bitmap, {
+		if (!--bitmap->refcount)
+			free(bitmap);
+	});
+	kh_destroy_oid_map(island_marks);
+}
+
 int compute_pack_layers(struct packing_data *to_pack)
 {
 	uint32_t i;
 
-	if (!core_island_name || !island_marks)
+	if (!island_marks)
+		return 1;
+
+	if (!core_island_name) {
+		free_island_marks();
 		return 1;
+	}
 
 	for (i = 0; i < to_pack->nr_objects; ++i) {
 		struct object_entry *entry = &to_pack->objects[i];
@@ -533,6 +549,7 @@ int compute_pack_layers(struct packing_data *to_pack)
 				oe_set_layer(to_pack, entry, 0);
 		}
 	}
+	free_island_marks();
 
 	return 2;
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] delta-islands: free island_marks and bitmaps
  2023-02-02  9:42   ` [PATCH v2] " Eric Wong
@ 2023-02-03 18:11     ` Jeff King
  2023-02-03 23:44       ` [PATCH v3] " Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-02-03 18:11 UTC (permalink / raw)
  To: Eric Wong; +Cc: Ævar Arnfjörð Bjarmason, git

On Thu, Feb 02, 2023 at 09:42:17AM +0000, Eric Wong wrote:

> > > +			free(bitmap);
> > > +	});
> > > +	kh_destroy_oid_map(island_marks);
> > > +	island_marks = (void *)1; /* crash on unintended future use */
> > 
> > This seems counter-productive. If you just leave it free'd then various
> > analysis tools will spot a use-after-free earlier, won't they?
> 
> *shrug*  I thought it might be better to use an explicitly bad
> pointer to catch lifetime issues that I might've missed from
> reading the code.  Since I've run and tested at this point,
> it probably doesn't matter, now.  So I'll just omit the
> assignment and save some icache footprint.

I think it is still worth protecting ourselves. You've run and tested
the current code, but the risk remains that somebody will later change
the code and introduce a regression.

Yes, analysis tools may spot it. But only if we actually trigger the
case in question via the test suite, and while running those tools. We
can get more consistent behavior by overwriting the pointer.

The usual thing, of course, would just be to set it to NULL. I guess you
used "(void *)1" in the original because code like in_same_island()
checks for NULL. And so it becomes indistinguishable whether we intended
not to use islands, or if there's a bug in the program.

If we are going to reduce the lifetime of island_marks, I think it would
make things cleaner to push the "do we expect to have islands" boolean
into its own variable, something like:

diff --git a/delta-islands.c b/delta-islands.c
index c09dab31a4..7dd4e1419a 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -22,6 +22,7 @@
 
 KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
 
+static int using_island_marks;
 static kh_oid_map_t *island_marks;
 static unsigned island_counter;
 static unsigned island_counter_core;
@@ -96,7 +97,7 @@ int in_same_island(const struct object_id *trg_oid, const struct object_id *src_
 	khiter_t trg_pos, src_pos;
 
 	/* If we aren't using islands, assume everything goes together. */
-	if (!island_marks)
+	if (!using_island_marks)
 		return 1;
 
 	/*
@@ -124,7 +125,7 @@ int island_delta_cmp(const struct object_id *a, const struct object_id *b)
 	khiter_t a_pos, b_pos;
 	struct island_bitmap *a_bitmap = NULL, *b_bitmap = NULL;
 
-	if (!island_marks)
+	if (!using_island_marks)
 		return 0;
 
 	a_pos = kh_get_oid_map(island_marks, *a);
@@ -485,6 +486,7 @@ void load_delta_islands(struct repository *r, int progress)
 {
 	struct island_load_data ild = { 0 };
 
+	using_island_marks = 1;
 	island_marks = kh_init_oid_map();
 
 	git_config(island_config_callback, &ild);
@@ -522,6 +524,7 @@ static void free_island_marks(void)
 			free(bitmap);
 	});
 	kh_destroy_oid_map(island_marks);
+	island_marks = NULL;
 }
 
 int compute_pack_layers(struct packing_data *to_pack)

Of course it would be cleaner still if there was a "struct
delta_islands" that encapsulated the variables, and then the caller
could free it when they're done. That's more work to retro-fit, though.

>  int compute_pack_layers(struct packing_data *to_pack)
>  {
>  	uint32_t i;
>  
> -	if (!core_island_name || !island_marks)
> +	if (!island_marks)
> +		return 1;
> +
> +	if (!core_island_name) {
> +		free_island_marks();
>  		return 1;
> +	}
>  
>  	for (i = 0; i < to_pack->nr_objects; ++i) {
>  		struct object_entry *entry = &to_pack->objects[i];
> @@ -533,6 +549,7 @@ int compute_pack_layers(struct packing_data *to_pack)
>  				oe_set_layer(to_pack, entry, 0);
>  		}
>  	}
> +	free_island_marks();
>  
>  	return 2;
>  }

This is pretty subtle. It implicitly assumes that compute_pack_layers()
is the last thing that will ever need to look at islands. Which is true
now (I think), but it's reasonable to think that somebody might use
island data during bitmap generation, too.

It seems like it would be a lot more obvious if the sole caller did the
free explicitly, like:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cabace4abb..3395f63aba 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -929,8 +929,10 @@ static struct object_entry **compute_write_order(void)
 	 */
 	for_each_tag_ref(mark_tagged, NULL);
 
-	if (use_delta_islands)
+	if (use_delta_islands) {
 		max_layers = compute_pack_layers(&to_pack);
+		free_island_marks();
+	}
 
 	ALLOC_ARRAY(wo, to_pack.nr_objects);
 	wo_end = 0;

And of course that would also be a tiny step in the right direction if
the delta islands API learned to use a struct (this would be the same
spot where we'd say "we're done with islands; free the struct").

-Peff

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3] delta-islands: free island_marks and bitmaps
  2023-02-03 18:11     ` Jeff King
@ 2023-02-03 23:44       ` Eric Wong
  2023-02-04 11:14         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2023-02-03 23:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, git

Jeff King <peff@peff.net> wrote:
> On Thu, Feb 02, 2023 at 09:42:17AM +0000, Eric Wong wrote:
> 
> > > > +			free(bitmap);
> > > > +	});
> > > > +	kh_destroy_oid_map(island_marks);
> > > > +	island_marks = (void *)1; /* crash on unintended future use */
> > > 
> > > This seems counter-productive. If you just leave it free'd then various
> > > analysis tools will spot a use-after-free earlier, won't they?
> > 
> > *shrug*  I thought it might be better to use an explicitly bad
> > pointer to catch lifetime issues that I might've missed from
> > reading the code.  Since I've run and tested at this point,
> > it probably doesn't matter, now.  So I'll just omit the
> > assignment and save some icache footprint.
> 
> I think it is still worth protecting ourselves. You've run and tested
> the current code, but the risk remains that somebody will later change
> the code and introduce a regression.
> 
> Yes, analysis tools may spot it. But only if we actually trigger the
> case in question via the test suite, and while running those tools. We
> can get more consistent behavior by overwriting the pointer.

OK.

> The usual thing, of course, would just be to set it to NULL. I guess you
> used "(void *)1" in the original because code like in_same_island()
> checks for NULL. And so it becomes indistinguishable whether we intended
> not to use islands, or if there's a bug in the program.

Yes, I used `(void *)1' because of the existing NULL checks.  I
should have used `(void *)-1' (MAP_FAILED) instead since
attempts to dereference 1 could conceivably point to a valid
address, whereas using a maximum value would not.

> If we are going to reduce the lifetime of island_marks, I think it would
> make things cleaner to push the "do we expect to have islands" boolean
> into its own variable, something like:
> 
> diff --git a/delta-islands.c b/delta-islands.c
> index c09dab31a4..7dd4e1419a 100644
> --- a/delta-islands.c
> +++ b/delta-islands.c
> @@ -22,6 +22,7 @@
>  
>  KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
>  
> +static int using_island_marks;
>  static kh_oid_map_t *island_marks;
>  static unsigned island_counter;
>  static unsigned island_counter_core;
> @@ -96,7 +97,7 @@ int in_same_island(const struct object_id *trg_oid, const struct object_id *src_
>  	khiter_t trg_pos, src_pos;
>  
>  	/* If we aren't using islands, assume everything goes together. */
> -	if (!island_marks)
> +	if (!using_island_marks)
>  		return 1;

I much prefer to rely on invalid pointers than extra flags since
having multiple sources of truth confuses me[1].

> Of course it would be cleaner still if there was a "struct
> delta_islands" that encapsulated the variables, and then the caller
> could free it when they're done. That's more work to retro-fit, though.

Yes and yes :>

> >  int compute_pack_layers(struct packing_data *to_pack)
> >  {
> >  	uint32_t i;
> >  
> > -	if (!core_island_name || !island_marks)
> > +	if (!island_marks)
> > +		return 1;
> > +
> > +	if (!core_island_name) {
> > +		free_island_marks();
> >  		return 1;
> > +	}
> >  
> >  	for (i = 0; i < to_pack->nr_objects; ++i) {
> >  		struct object_entry *entry = &to_pack->objects[i];
> > @@ -533,6 +549,7 @@ int compute_pack_layers(struct packing_data *to_pack)
> >  				oe_set_layer(to_pack, entry, 0);
> >  		}
> >  	}
> > +	free_island_marks();
> >  
> >  	return 2;
> >  }
> 
> This is pretty subtle. It implicitly assumes that compute_pack_layers()
> is the last thing that will ever need to look at islands. Which is true
> now (I think), but it's reasonable to think that somebody might use
> island data during bitmap generation, too.

Yeah.  The reason I used an invalid pointer in the original was
because I wasn't certain compute_pack_layers() was the last user.

> It seems like it would be a lot more obvious if the sole caller did the
> free explicitly, like:
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index cabace4abb..3395f63aba 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -929,8 +929,10 @@ static struct object_entry **compute_write_order(void)
>  	 */
>  	for_each_tag_ref(mark_tagged, NULL);
>  
> -	if (use_delta_islands)
> +	if (use_delta_islands) {
>  		max_layers = compute_pack_layers(&to_pack);
> +		free_island_marks();
> +	}

OK, I've done that for v3.  Thanks.

>  	ALLOC_ARRAY(wo, to_pack.nr_objects);
>  	wo_end = 0;
> 
> And of course that would also be a tiny step in the right direction if
> the delta islands API learned to use a struct (this would be the same
> spot where we'd say "we're done with islands; free the struct").

I do wonder about performance on register-starved systems,
though, especially if stuff like island_delta_cmp gets called
frequently.  I already have enough performance problems atm :<

[1] to go farther, I might even eliminate `int use_delta_islands' as
    a global from builtin/pack-objects.c and just have that become a
    `struct delta_islands_foo *' or something.  But I have more
    pressing performance problems to figure out :<

--------8<-------
Subject: [PATCH v3] delta-islands: free island_marks and bitmaps

On my mirror of linux.git forkgroup with 780 islands, this saves
nearly 4G of heap memory in pack-objects.  This savings only
benefits delta island users of pack bitmaps, as the process
would otherwise be exiting anyways.

However, there's probably not many delta island users, but the
majority of delta island users would also be pack bitmaps users.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Jeff King <peff@peff.net>
---
 v3: use (void *)-1 to detect use-after-free,
     hoist out free_island_marks() call for readability

 builtin/pack-objects.c |  4 +++-
 delta-islands.c        | 14 ++++++++++++++
 delta-islands.h        |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cabace4abb..3395f63aba 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -929,8 +929,10 @@ static struct object_entry **compute_write_order(void)
 	 */
 	for_each_tag_ref(mark_tagged, NULL);
 
-	if (use_delta_islands)
+	if (use_delta_islands) {
 		max_layers = compute_pack_layers(&to_pack);
+		free_island_marks();
+	}
 
 	ALLOC_ARRAY(wo, to_pack.nr_objects);
 	wo_end = 0;
diff --git a/delta-islands.c b/delta-islands.c
index 90c0d6958f..8b234cb85b 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -513,6 +513,20 @@ void propagate_island_marks(struct commit *commit)
 	}
 }
 
+void free_island_marks(void)
+{
+	struct island_bitmap *bitmap;
+
+	kh_foreach_value(island_marks, bitmap, {
+		if (!--bitmap->refcount)
+			free(bitmap);
+	});
+	kh_destroy_oid_map(island_marks);
+
+	/* detect use-after-free with a an address which is never valid: */
+	island_marks = (void *)-1;
+}
+
 int compute_pack_layers(struct packing_data *to_pack)
 {
 	uint32_t i;
diff --git a/delta-islands.h b/delta-islands.h
index eb0f952629..8d1591ae28 100644
--- a/delta-islands.h
+++ b/delta-islands.h
@@ -14,5 +14,6 @@ void resolve_tree_islands(struct repository *r,
 void load_delta_islands(struct repository *r, int progress);
 void propagate_island_marks(struct commit *commit);
 int compute_pack_layers(struct packing_data *to_pack);
+void free_island_marks(void);
 
 #endif /* DELTA_ISLANDS_H */

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] delta-islands: free island_marks and bitmaps
  2023-02-03 23:44       ` [PATCH v3] " Eric Wong
@ 2023-02-04 11:14         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2023-02-04 11:14 UTC (permalink / raw)
  To: Eric Wong; +Cc: Ævar Arnfjörð Bjarmason, git

On Fri, Feb 03, 2023 at 11:44:30PM +0000, Eric Wong wrote:

> >  	/* If we aren't using islands, assume everything goes together. */
> > -	if (!island_marks)
> > +	if (!using_island_marks)
> >  		return 1;
> 
> I much prefer to rely on invalid pointers than extra flags since
> having multiple sources of truth confuses me[1].

That's kind of my point, though. It's not multiple sources of truth, but
rather there are two bits "did we ask to use islands" and "is
island_marks still valid". You are shoving both bits into the same
variable by using a special sentinel pointer.

> > And of course that would also be a tiny step in the right direction if
> > the delta islands API learned to use a struct (this would be the same
> > spot where we'd say "we're done with islands; free the struct").
> 
> I do wonder about performance on register-starved systems,
> though, especially if stuff like island_delta_cmp gets called
> frequently.  I already have enough performance problems atm :<

Calling in_same_island() is pretty heavy-weight (it's multiple hash
lookups, and then an arbitrary-length bit-string comparison). I'd be
shocked if replacing a global with a struct pointer is even measurable.

> [1] to go farther, I might even eliminate `int use_delta_islands' as
>     a global from builtin/pack-objects.c and just have that become a
>     `struct delta_islands_foo *' or something.  But I have more
>     pressing performance problems to figure out :<

Right, that's along the same lines I was thinking. But I don't blame
you for not tackling it. The upside is fairly minimal.

> +	/* detect use-after-free with a an address which is never valid: */
> +	island_marks = (void *)-1;

I still hate how magical this line is, but I don't that it's worth
arguing about more.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-02-04 11:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02  1:03 [PATCH] delta-islands: free island_marks and bitmaps Eric Wong
2023-02-02  1:45 ` Ævar Arnfjörð Bjarmason
2023-02-02  9:42   ` [PATCH v2] " Eric Wong
2023-02-03 18:11     ` Jeff King
2023-02-03 23:44       ` [PATCH v3] " Eric Wong
2023-02-04 11:14         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).