git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] midx: traverse the local MIDX first
@ 2020-08-28 18:06 Taylor Blau
  2020-08-28 18:27 ` Derrick Stolee
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Taylor Blau @ 2020-08-28 18:06 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.

But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.

The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).

This patch bandaids that situation by trying to place the local MIDX
first in the chain when calling 'prepare_multi_pack_index_one()'.
Specifically, it always inserts all MIDXs loads subsequent to the local
one as the head of the tail of the MIDX chain. This makes it so that we
don't have a quadratic insertion while still keeping the local MIDX at
the front of the list. Likewise, it avoids an 'O(m*n)' lookup in
'remove_redundant_pack()' where 'm' is the number of redundant packs and
'n' is the number of alternates.

Protect 'remove_redundant_pack()' against a case where alternates with
MIDXs exist, but no local MIDX exists by first checking that 'm->local'
before asking whether or not it contains the requested pack.

This invariant is only preserved by the insertion order in
'prepare_packed_git()', which traverses through the ODB's '->next'
pointer, meaning we visit the local object store first. This fragility
makes this an undesirable long-term solution, but it is acceptable for
now since this is the only caller.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
This is kind of a hack, but the order that we call
'prepare_multi_pack_index_one()' from 'prepare_packed_git()' makes it
acceptable, at least in my own assessment.

It is fixing a breakage in 'next', so I'd be inclined to merge this up.
But, if this is too gross, I'd just as soon revert what's in 'next' and
try again later.

 builtin/repack.c | 2 +-
 midx.c           | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 98fac03946..5661d69c16 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -135,7 +135,7 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
 	struct strbuf buf = STRBUF_INIT;
 	struct multi_pack_index *m = get_multi_pack_index(the_repository);
 	strbuf_addf(&buf, "%s.pack", base_name);
-	if (m && midx_contains_pack(m, buf.buf))
+	if (m && m->local && midx_contains_pack(m, buf.buf))
 		clear_midx_file(the_repository);
 	strbuf_insertf(&buf, 0, "%s/", dir_name);
 	unlink_pack_path(buf.buf, 1);
diff --git a/midx.c b/midx.c
index e9b2e1253a..cc19b66152 100644
--- a/midx.c
+++ b/midx.c
@@ -416,8 +416,12 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 	m = load_multi_pack_index(object_dir, local);

 	if (m) {
-		m->next = r->objects->multi_pack_index;
-		r->objects->multi_pack_index = m;
+		struct multi_pack_index *mp = r->objects->multi_pack_index;
+		if (mp) {
+			m->next = mp->next;
+			mp->next = m;
+		} else
+			r->objects->multi_pack_index = m;
 		return 1;
 	}

--
2.28.0.338.g87a3b7a5a2.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 18:06 [PATCH] midx: traverse the local MIDX first Taylor Blau
@ 2020-08-28 18:27 ` Derrick Stolee
  2020-08-28 18:50 ` Jeff King
  2020-08-28 20:22 ` [PATCH v2] " Taylor Blau
  2 siblings, 0 replies; 10+ messages in thread
From: Derrick Stolee @ 2020-08-28 18:27 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, gitster, peff

On 8/28/2020 2:06 PM, Taylor Blau wrote:
> This invariant is only preserved by the insertion order in
> 'prepare_packed_git()', which traverses through the ODB's '->next'
> pointer, meaning we visit the local object store first. This fragility
> makes this an undesirable long-term solution, but it is acceptable for
> now since this is the only caller.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> This is kind of a hack, but the order that we call
> 'prepare_multi_pack_index_one()' from 'prepare_packed_git()' makes it
> acceptable, at least in my own assessment.

The natural alternative would be to scan the list _after_ all are
inserted and pull any MIDX marked "local" to the front of the list.
Such a check would need to happen in the same method that iterates
over all alternates, so that seems a bit redundant.

While perhaps a bit hack-ish, I think this is a sound approach.
And, we have a test that will detect change in behavior here!

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 18:06 [PATCH] midx: traverse the local MIDX first Taylor Blau
  2020-08-28 18:27 ` Derrick Stolee
@ 2020-08-28 18:50 ` Jeff King
  2020-08-28 18:55   ` Jeff King
  2020-08-28 18:55   ` Taylor Blau
  2020-08-28 20:22 ` [PATCH v2] " Taylor Blau
  2 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2020-08-28 18:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster

On Fri, Aug 28, 2020 at 02:06:21PM -0400, Taylor Blau wrote:

> The occurs when dropping a pack known to the local MIDX with alternates
> configured that have their own MIDX. Since the alternate's MIDX is
> returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
> true (which is correct, since it traverses through the '->next' pointer
> to find the MIDX in the chain that does contain the requested object).
> But, we call 'clear_midx_file()' on 'the_repository', which drops the
> MIDX at the path of the first MIDX in the chain, which (in the case of
> t7700.6 is the one in the alternate).

OK, that makes sense for what we're seeing.

> This patch bandaids that situation by trying to place the local MIDX
> first in the chain when calling 'prepare_multi_pack_index_one()'.
> Specifically, it always inserts all MIDXs loads subsequent to the local
> one as the head of the tail of the MIDX chain. This makes it so that we
> don't have a quadratic insertion while still keeping the local MIDX at
> the front of the list. Likewise, it avoids an 'O(m*n)' lookup in
> 'remove_redundant_pack()' where 'm' is the number of redundant packs and
> 'n' is the number of alternates.
> 
> Protect 'remove_redundant_pack()' against a case where alternates with
> MIDXs exist, but no local MIDX exists by first checking that 'm->local'
> before asking whether or not it contains the requested pack.

It seems like the root of the problem is that get_multi_pack_index() is
being used for two different things:

  - most callers want to iterate over all of the possible midx files,
    because they're trying to look up an object.

  - this caller wants the single midx for the local repository (I
    wondered if there might be more of these that we just never noticed
    because the test suite doesn't use alternates, but having just
    audited them all, the answer is no).

So I'd be tempted to say that the latter callers should be using a
separate function that gives them what they want. That lets them avoid
being too intimate with the details of how we order things.

The patch below illustrates that.  It also changes the existing function
name to avoid confusion and to help audit the existing callers, but
that's optional and maybe not worth it.

It does do the linear lookup, so it has the O(m*n) you mentioned. But
the number of alternates is generally small, and I'd be shocked if this
was the first m*n loop over the number of alternates. However, we could
still do the ordering thing and just keep the details inside the new
function.

---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7016b28485..a9be7480df 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1246,7 +1246,7 @@ static int want_object_in_pack(const struct object_id *oid,
 			return want;
 	}
 
-	for (m = get_multi_pack_index(the_repository); m; m = m->next) {
+	for (m = get_all_multi_pack_index(the_repository); m; m = m->next) {
 		struct pack_entry e;
 		if (fill_midx_entry(the_repository, oid, &e, m)) {
 			struct packed_git *p = e.p;
diff --git a/builtin/repack.c b/builtin/repack.c
index f10f52779c..60cb196956 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -133,7 +133,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
 	strbuf_addf(&buf, "%s.pack", base_name);
 	if (m && midx_contains_pack(m, buf.buf))
 		clear_midx_file(the_repository);
diff --git a/packfile.c b/packfile.c
index 6ab5233613..ece317642a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -915,7 +915,7 @@ unsigned long repo_approximate_object_count(struct repository *r)
 
 		prepare_packed_git(r);
 		count = 0;
-		for (m = get_multi_pack_index(r); m; m = m->next)
+		for (m = get_all_multi_pack_index(r); m; m = m->next)
 			count += m->num_objects;
 		for (p = r->objects->packed_git; p; p = p->next) {
 			if (open_pack_index(p))
@@ -1021,12 +1021,24 @@ struct packed_git *get_packed_git(struct repository *r)
 	return r->objects->packed_git;
 }
 
-struct multi_pack_index *get_multi_pack_index(struct repository *r)
+struct multi_pack_index *get_all_multi_pack_index(struct repository *r)
 {
 	prepare_packed_git(r);
 	return r->objects->multi_pack_index;
 }
 
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
+{
+	struct multi_pack_index *m;
+
+	for (m = get_all_multi_pack_index(r); m; m = m->next) {
+		if (m->local)
+			break;
+	}
+
+	return m;
+}
+
 struct packed_git *get_all_packs(struct repository *r)
 {
 	struct multi_pack_index *m;
diff --git a/packfile.h b/packfile.h
index 240aa73b95..2516eb4667 100644
--- a/packfile.h
+++ b/packfile.h
@@ -56,7 +56,8 @@ void install_packed_git(struct repository *r, struct packed_git *pack);
 
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
-struct multi_pack_index *get_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_all_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
 struct packed_git *get_all_packs(struct repository *r);
 
 /*
diff --git a/sha1-name.c b/sha1-name.c
index 0b8cb5247a..be6b675953 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -184,7 +184,7 @@ static void find_short_packed_object(struct disambiguate_state *ds)
 	struct multi_pack_index *m;
 	struct packed_git *p;
 
-	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
+	for (m = get_all_multi_pack_index(ds->repo); m && !ds->ambiguous;
 	     m = m->next)
 		unique_in_midx(m, ds);
 	for (p = get_packed_git(ds->repo); p && !ds->ambiguous;
@@ -660,7 +660,7 @@ static void find_abbrev_len_packed(struct min_abbrev_data *mad)
 	struct multi_pack_index *m;
 	struct packed_git *p;
 
-	for (m = get_multi_pack_index(mad->repo); m; m = m->next)
+	for (m = get_all_multi_pack_index(mad->repo); m; m = m->next)
 		find_abbrev_len_for_midx(m, mad);
 	for (p = get_packed_git(mad->repo); p; p = p->next)
 		find_abbrev_len_for_pack(p, mad);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 18:50 ` Jeff King
@ 2020-08-28 18:55   ` Jeff King
  2020-08-28 19:03     ` Derrick Stolee
  2020-08-28 18:55   ` Taylor Blau
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff King @ 2020-08-28 18:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster

On Fri, Aug 28, 2020 at 02:50:39PM -0400, Jeff King wrote:

> So I'd be tempted to say that the latter callers should be using a
> separate function that gives them what they want. That lets them avoid
> being too intimate with the details of how we order things.
> 
> The patch below illustrates that.  It also changes the existing function
> name to avoid confusion and to help audit the existing callers, but
> that's optional and maybe not worth it.

And here's the same concept as a more minimal change, suitable for
squashing into yours. The advantage is that it keeps the "the local one
goes first" logic in one abstracted spot.

diff --git a/builtin/repack.c b/builtin/repack.c
index 28b0c1bf1b..60cb196956 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -133,9 +133,9 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
 	strbuf_addf(&buf, "%s.pack", base_name);
-	if (m && m->local && midx_contains_pack(m, buf.buf))
+	if (m && midx_contains_pack(m, buf.buf))
 		clear_midx_file(the_repository);
 	strbuf_insertf(&buf, 0, "%s/", dir_name);
 	unlink_pack_path(buf.buf, 1);
diff --git a/packfile.c b/packfile.c
index 6ab5233613..9ef27508f2 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1027,6 +1027,17 @@ struct multi_pack_index *get_multi_pack_index(struct repository *r)
 	return r->objects->multi_pack_index;
 }
 
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
+{
+	struct multi_pack_index *m = get_multi_pack_index(r);
+
+	/* no need to iterate; we always put the local one first (if any) */
+	if (m && m->local)
+		return m;
+
+	return NULL;
+}
+
 struct packed_git *get_all_packs(struct repository *r)
 {
 	struct multi_pack_index *m;
diff --git a/packfile.h b/packfile.h
index 240aa73b95..a58fc738e0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -57,6 +57,7 @@ void install_packed_git(struct repository *r, struct packed_git *pack);
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
 struct multi_pack_index *get_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
 struct packed_git *get_all_packs(struct repository *r);
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 18:50 ` Jeff King
  2020-08-28 18:55   ` Jeff King
@ 2020-08-28 18:55   ` Taylor Blau
  1 sibling, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2020-08-28 18:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster

On Fri, Aug 28, 2020 at 02:50:39PM -0400, Jeff King wrote:
> On Fri, Aug 28, 2020 at 02:06:21PM -0400, Taylor Blau wrote:
> [ snip ]
>
> > This patch bandaids that situation by trying to place the local MIDX
> > first in the chain when calling 'prepare_multi_pack_index_one()'.
> > Specifically, it always inserts all MIDXs loads subsequent to the local
> > one as the head of the tail of the MIDX chain. This makes it so that we
> > don't have a quadratic insertion while still keeping the local MIDX at
> > the front of the list. Likewise, it avoids an 'O(m*n)' lookup in
> > 'remove_redundant_pack()' where 'm' is the number of redundant packs and
> > 'n' is the number of alternates.
> >
> > Protect 'remove_redundant_pack()' against a case where alternates with
> > MIDXs exist, but no local MIDX exists by first checking that 'm->local'
> > before asking whether or not it contains the requested pack.
>
> It seems like the root of the problem is that get_multi_pack_index() is
> being used for two different things:
>
>   - most callers want to iterate over all of the possible midx files,
>     because they're trying to look up an object.
>
>   - this caller wants the single midx for the local repository (I
>     wondered if there might be more of these that we just never noticed
>     because the test suite doesn't use alternates, but having just
>     audited them all, the answer is no).
>
> So I'd be tempted to say that the latter callers should be using a
> separate function that gives them what they want. That lets them avoid
> being too intimate with the details of how we order things.
>
> The patch below illustrates that.  It also changes the existing function
> name to avoid confusion and to help audit the existing callers, but
> that's optional and maybe not worth it.
>
> It does do the linear lookup, so it has the O(m*n) you mentioned. But
> the number of alternates is generally small, and I'd be shocked if this
> was the first m*n loop over the number of alternates. However, we could
> still do the ordering thing and just keep the details inside the new
> function.

I'd be much happier with this patch.

If you wanted to go further, we could do both, and tighten up the
insertion code to make sure that the local MIDX is always first from the
perspective of 'r->objects->multi_pack_index' so that the linear lookup
drops to constant.

But, it might be overkill, since I also tend to think that the number of
alternates is small, and we're not even talking about a difference of
milliseconds here.

So, I'm happy to clean it up for you and forge your sign-off with
permission, or otherwise you're welcome to do so, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 18:55   ` Jeff King
@ 2020-08-28 19:03     ` Derrick Stolee
  2020-08-28 19:07       ` Taylor Blau
  0 siblings, 1 reply; 10+ messages in thread
From: Derrick Stolee @ 2020-08-28 19:03 UTC (permalink / raw)
  To: Jeff King, Taylor Blau; +Cc: git, dstolee, gitster

On 8/28/2020 2:55 PM, Jeff King wrote:
> On Fri, Aug 28, 2020 at 02:50:39PM -0400, Jeff King wrote:
> 
>> So I'd be tempted to say that the latter callers should be using a
>> separate function that gives them what they want. That lets them avoid
>> being too intimate with the details of how we order things.
>>
>> The patch below illustrates that.  It also changes the existing function
>> name to avoid confusion and to help audit the existing callers, but
>> that's optional and maybe not worth it.
> 
> And here's the same concept as a more minimal change, suitable for
> squashing into yours. The advantage is that it keeps the "the local one
> goes first" logic in one abstracted spot.

This is nice because it is more future-proof: if we needed to
change the order of the midx list, then we could update the
implementation of this method instead of every caller.

Personally, I prefer this one (squashed).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 19:03     ` Derrick Stolee
@ 2020-08-28 19:07       ` Taylor Blau
  2020-08-28 19:51         ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2020-08-28 19:07 UTC (permalink / raw)
  To: Derrick Stolee, Jeff King; +Cc: git, dstolee, gitster

On Fri, Aug 28, 2020 at 03:03:09PM -0400, Derrick Stolee wrote:
> On 8/28/2020 2:55 PM, Jeff King wrote:
> > On Fri, Aug 28, 2020 at 02:50:39PM -0400, Jeff King wrote:
> >
> >> So I'd be tempted to say that the latter callers should be using a
> >> separate function that gives them what they want. That lets them avoid
> >> being too intimate with the details of how we order things.
> >>
> >> The patch below illustrates that.  It also changes the existing function
> >> name to avoid confusion and to help audit the existing callers, but
> >> that's optional and maybe not worth it.
> >
> > And here's the same concept as a more minimal change, suitable for
> > squashing into yours. The advantage is that it keeps the "the local one
> > goes first" logic in one abstracted spot.
>
> This is nice because it is more future-proof: if we needed to
> change the order of the midx list, then we could update the
> implementation of this method instead of every caller.
>
> Personally, I prefer this one (squashed).

Ditto. Peff and I crossed emails, so I was talking about tidying up his
earlier patch before I even had seen the second one.

Peff -- any objections to me squashing this into mine and sending that
for queueing?

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] midx: traverse the local MIDX first
  2020-08-28 19:07       ` Taylor Blau
@ 2020-08-28 19:51         ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2020-08-28 19:51 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, dstolee, gitster

On Fri, Aug 28, 2020 at 03:07:52PM -0400, Taylor Blau wrote:

> Ditto. Peff and I crossed emails, so I was talking about tidying up his
> earlier patch before I even had seen the second one.
> 
> Peff -- any objections to me squashing this into mine and sending that
> for queueing?

Nope, please do (and feel free to forge my signoff).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] midx: traverse the local MIDX first
  2020-08-28 18:06 [PATCH] midx: traverse the local MIDX first Taylor Blau
  2020-08-28 18:27 ` Derrick Stolee
  2020-08-28 18:50 ` Jeff King
@ 2020-08-28 20:22 ` Taylor Blau
  2020-08-28 21:19   ` Jeff King
  2 siblings, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2020-08-28 20:22 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, peff

When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.

But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.

The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).

This patch addresses that by:

  - placing the local MIDX first in the chain when calling
    'prepare_multi_pack_index_one()', and

  - introducing a new 'get_local_multi_pack_index()', which explicitly
    returns the repository-local MIDX, if any.

Don't impose an additional order on the MIDX's '->next' pointer beyond
that the first item in the chain must be local if one exists so that we
avoid a quadratic insertion.

Likewise, use 'get_local_multi_pack_index()' in
'remove_redundant_pack()' to fix the formerly broken t7700.6 when run
with 'GIT_TEST_MULTI_PACK_INDEX=1'.

Finally, note that the MIDX ordering invariant is only preserved by the
insertion order in 'prepare_packed_git()', which traverses through the
ODB's '->next' pointer, meaning we visit the local object store first.
This fragility makes this an undesirable long-term solution if more
callers are added, but it is acceptable for now since this is the only
caller.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c |  2 +-
 midx.c           |  8 ++++++--
 packfile.c       | 11 +++++++++++
 packfile.h       |  1 +
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 98fac03946..01e7767c79 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -133,7 +133,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
 	strbuf_addf(&buf, "%s.pack", base_name);
 	if (m && midx_contains_pack(m, buf.buf))
 		clear_midx_file(the_repository);
diff --git a/midx.c b/midx.c
index e9b2e1253a..cc19b66152 100644
--- a/midx.c
+++ b/midx.c
@@ -416,8 +416,12 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 	m = load_multi_pack_index(object_dir, local);
 
 	if (m) {
-		m->next = r->objects->multi_pack_index;
-		r->objects->multi_pack_index = m;
+		struct multi_pack_index *mp = r->objects->multi_pack_index;
+		if (mp) {
+			m->next = mp->next;
+			mp->next = m;
+		} else
+			r->objects->multi_pack_index = m;
 		return 1;
 	}
 
diff --git a/packfile.c b/packfile.c
index 6ab5233613..9ef27508f2 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1027,6 +1027,17 @@ struct multi_pack_index *get_multi_pack_index(struct repository *r)
 	return r->objects->multi_pack_index;
 }
 
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
+{
+	struct multi_pack_index *m = get_multi_pack_index(r);
+
+	/* no need to iterate; we always put the local one first (if any) */
+	if (m && m->local)
+		return m;
+
+	return NULL;
+}
+
 struct packed_git *get_all_packs(struct repository *r)
 {
 	struct multi_pack_index *m;
diff --git a/packfile.h b/packfile.h
index 240aa73b95..a58fc738e0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -57,6 +57,7 @@ void install_packed_git(struct repository *r, struct packed_git *pack);
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
 struct multi_pack_index *get_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
 struct packed_git *get_all_packs(struct repository *r);
 
 /*
-- 
2.28.0.338.g87a3b7a5a2.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] midx: traverse the local MIDX first
  2020-08-28 20:22 ` [PATCH v2] " Taylor Blau
@ 2020-08-28 21:19   ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2020-08-28 21:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster

On Fri, Aug 28, 2020 at 04:22:13PM -0400, Taylor Blau wrote:

> This patch addresses that by:
> 
>   - placing the local MIDX first in the chain when calling
>     'prepare_multi_pack_index_one()', and
> 
>   - introducing a new 'get_local_multi_pack_index()', which explicitly
>     returns the repository-local MIDX, if any.
> 
> Don't impose an additional order on the MIDX's '->next' pointer beyond
> that the first item in the chain must be local if one exists so that we
> avoid a quadratic insertion.

This version looks fine to me.

Thinking on it a bit more, we probably want this "local one is first"
behavior even without it fixing this bug. For normal packs, we always
prefer local copies over alternates, under the assumption that
alternates are likely to be more expensive to access (e.g., shared nfs).

Of course that somewhat goes out the window since we re-order lookups
these days based on where we've found previous objects (my mru stuff,
but even before that the single "last pack" variable). But it makes
sense to at least start with the local ones being given priority.

I don't think we do any mru tricks with the midx list, so it would stay
in local-first mode always, which is reasonable (you probably don't have
so many midx's that any reordering is worth it anyway).

It does mean we may consult an alternates midx before any local non-midx
packs, which _could_ be slower. But since this is all guesses and
heuristics anyway, I'd wait until somebody has a concrete case where
they can demonstrate a different ordering scheme works better.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-08-28 21:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 18:06 [PATCH] midx: traverse the local MIDX first Taylor Blau
2020-08-28 18:27 ` Derrick Stolee
2020-08-28 18:50 ` Jeff King
2020-08-28 18:55   ` Jeff King
2020-08-28 19:03     ` Derrick Stolee
2020-08-28 19:07       ` Taylor Blau
2020-08-28 19:51         ` Jeff King
2020-08-28 18:55   ` Taylor Blau
2020-08-28 20:22 ` [PATCH v2] " Taylor Blau
2020-08-28 21:19   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).