* [PATCH 0/3] delta-island fixes
@ 2018-11-20 9:44 Jeff King
2018-11-20 9:46 ` [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Jeff King
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Jeff King @ 2018-11-20 9:44 UTC (permalink / raw)
To: git; +Cc: Christian Couder, Derrick Stolee
This fixes a few bugs in the cc/delta-islands topic: one major, and two
minor.
Sadly, the major one was not caught by our test suite, and I'm not sure
how to remedy that. The problem is a memory management one, and happens
only when you have a reasonably large number of objects. So it triggers
readily when run on a real repository, but not on the toy one in t5320.
Creating a much larger repository there would make the test suite more
expensive.
In cases like this I think it's often a good idea to have a perf test.
Those are expensive anyway, and we'd have the double benefit of
exercising the code and showing off the performance improvement. But the
delta-island code only makes sense on a very specialized repo: one where
you have multiple related but diverging histories. You could simulate
that by picking two branches in a repository, but the effect is going to
be miniscule.
Anyway, here are the fixes without tests. We should at least apply these
before v2.20 ships with the bugs.
[1/3]: pack-objects: fix tree_depth and layer invariants
[2/3]: pack-objects: zero-initialize tree_depth/layer arrays
[3/3]: pack-objects: fix off-by-one in delta-island tree-depth computation
builtin/pack-objects.c | 4 +++-
git-compat-util.h | 1 +
pack-objects.h | 4 ++--
3 files changed, 6 insertions(+), 3 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] pack-objects: fix tree_depth and layer invariants
2018-11-20 9:44 [PATCH 0/3] delta-island fixes Jeff King
@ 2018-11-20 9:46 ` Jeff King
2018-11-20 16:37 ` Duy Nguyen
2018-11-21 4:52 ` Junio C Hamano
2018-11-20 9:48 ` [PATCH 2/3] pack-objects: zero-initialize tree_depth/layer arrays Jeff King
` (2 subsequent siblings)
3 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2018-11-20 9:46 UTC (permalink / raw)
To: git; +Cc: Christian Couder, Derrick Stolee
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) dynamically manages a tree_depth array in
packing_data that maintains one of these invariants:
1. tree_depth is NULL (i.e., the requested options don't require us to
track tree depths)
2. tree_depth is non-NULL and has as many entries as the "objects"
array
We maintain (2) by:
a. When the objects array grows, grow tree_depth to the same size
(unless it's NULL, in which case we can leave it).
b. When a caller asks to set a depth via oe_set_tree_depth(), if
tree_depth is NULL we allocate it.
But in (b), we use the number of stored objects, _not_ the allocated
size of the objects array. So we can run into a situation like this:
1. packlist_alloc() needs to store the Nth object, so it grows the
objects array to M, where M > N.
2. oe_set_tree_depth() wants to store a depth, so it allocates an
array of length N. Now we've violated our invariant.
3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
grow the objects array, since N <= M still holds. We try to assign
to tree_depth[N+1], which is out of bounds.
That doesn't happen in our test scripts, because the repositories they
use are so small, but it's easy to trigger by running:
echo HEAD | git pack-objects --revs --delta-islands --stdout >/dev/null
in any reasonably-sized repo (like git.git).
We can fix it by always growing the array to match pack->nr_alloc, not
pack->nr_objects. Likewise for the "layer" array from fe0ac2fb7f
(pack-objects: move 'layer' into 'struct packing_data', 2018-08-16),
which has the same bug.
Signed-off-by: Jeff King <peff@peff.net>
---
pack-objects.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pack-objects.h b/pack-objects.h
index feb6a6a05e..f31ac1c81c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data *pack,
unsigned int tree_depth)
{
if (!pack->tree_depth)
- ALLOC_ARRAY(pack->tree_depth, pack->nr_objects);
+ ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
pack->tree_depth[e - pack->objects] = tree_depth;
}
@@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack,
unsigned char layer)
{
if (!pack->layer)
- ALLOC_ARRAY(pack->layer, pack->nr_objects);
+ ALLOC_ARRAY(pack->layer, pack->nr_alloc);
pack->layer[e - pack->objects] = layer;
}
--
2.20.0.rc0.715.gf6b01ab3e1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] pack-objects: zero-initialize tree_depth/layer arrays
2018-11-20 9:44 [PATCH 0/3] delta-island fixes Jeff King
2018-11-20 9:46 ` [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Jeff King
@ 2018-11-20 9:48 ` Jeff King
2018-11-20 9:50 ` [PATCH 3/3] pack-objects: fix off-by-one in delta-island tree-depth computation Jeff King
2018-11-20 15:06 ` [PATCH 0/3] delta-island fixes Derrick Stolee
3 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2018-11-20 9:48 UTC (permalink / raw)
To: git; +Cc: Christian Couder, Derrick Stolee
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) started maintaining a tree_depth array that
matches the "objects" array. We extend the array when:
1. The objects array is extended, in which case we use realloc to
extend the tree_depth array.
2. A caller asks to store a tree_depth for object N, and this is the
first such request; we create the array from scratch and store the
value for N.
In the latter case, though, we use regular xmalloc(), and the depth
values for any objects besides N is undefined. This happens to not
trigger a bug with the current code, but the reasons are quite subtle:
- we never ask about the depth for any object with index i < N. This is
because we store the depth immediately for all trees and blobs. So
any such "i" must be a non-tree, and therefore we will never need to
care about its depth (in fact, we really only care about the depth of
trees).
- there are no objects at this point with index i > N, because we
always fill in the depth for a tree immediately after its object
entry is created (we may still allocate uninitialized depth entries,
but they'll be initialized by packlist_alloc() when it initializes
the entry in the "objects" array).
So it works, but only by chance. To be defensive, let's zero the array,
which matches the "unset" values which would be handed out by
oe_tree_depth() already.
Signed-off-by: Jeff King <peff@peff.net>
---
git-compat-util.h | 1 +
pack-objects.h | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/git-compat-util.h b/git-compat-util.h
index f16058182f..09b0102cae 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -861,6 +861,7 @@ extern FILE *fopen_or_warn(const char *path, const char *mode);
#define FREE_AND_NULL(p) do { free(p); (p) = NULL; } while (0)
#define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc)))
+#define CALLOC_ARRAY(x, alloc) (x) = xcalloc((alloc), sizeof(*(x)));
#define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), (alloc)))
#define COPY_ARRAY(dst, src, n) copy_array((dst), (src), (n), sizeof(*(dst)) + \
diff --git a/pack-objects.h b/pack-objects.h
index f31ac1c81c..dc869f26c2 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data *pack,
unsigned int tree_depth)
{
if (!pack->tree_depth)
- ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
+ CALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
pack->tree_depth[e - pack->objects] = tree_depth;
}
@@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack,
unsigned char layer)
{
if (!pack->layer)
- ALLOC_ARRAY(pack->layer, pack->nr_alloc);
+ CALLOC_ARRAY(pack->layer, pack->nr_alloc);
pack->layer[e - pack->objects] = layer;
}
--
2.20.0.rc0.715.gf6b01ab3e1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] pack-objects: fix off-by-one in delta-island tree-depth computation
2018-11-20 9:44 [PATCH 0/3] delta-island fixes Jeff King
2018-11-20 9:46 ` [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Jeff King
2018-11-20 9:48 ` [PATCH 2/3] pack-objects: zero-initialize tree_depth/layer arrays Jeff King
@ 2018-11-20 9:50 ` Jeff King
2018-11-20 15:06 ` [PATCH 0/3] delta-island fixes Derrick Stolee
3 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2018-11-20 9:50 UTC (permalink / raw)
To: git; +Cc: Christian Couder, Derrick Stolee
When delta-islands are in use, we need to record the deepest path at
which we find each tree and blob. Our loop to do so counts slashes, so
"foo" is depth 0, "foo/bar" is depth 1, and so on.
However, this neglects root trees, which are represented by the empty
string. Those also have depth 0, but are at a layer above "foo". Thus,
"foo" should be 1, "foo/bar" at 2, and so on. We use this depth to
topo-sort the trees in resolve_tree_islands(). As a result, we may fail
to visit a root tree before the sub-trees it contains, and therefore not
correctly pass down the island marks.
That in turn could lead to missing some delta opportunities (objects are
in the same island, but we didn't realize it) or creating unwanted
cross-island deltas (one object is in an island another isn't, but we
don't realize). In practice, it seems to have only a small effect. Some
experiments on the real-world git/git fork network at GitHub showed an
improvement of only 0.14% in the resulting clone size.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/pack-objects.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e7ea206c08..411aefd687 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2786,9 +2786,11 @@ static void show_object(struct object *obj, const char *name, void *data)
if (use_delta_islands) {
const char *p;
- unsigned depth = 0;
+ unsigned depth;
struct object_entry *ent;
+ /* the empty string is a root tree, which is depth 0 */
+ depth = *name ? 1 : 0;
for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
depth++;
--
2.20.0.rc0.715.gf6b01ab3e1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] delta-island fixes
2018-11-20 9:44 [PATCH 0/3] delta-island fixes Jeff King
` (2 preceding siblings ...)
2018-11-20 9:50 ` [PATCH 3/3] pack-objects: fix off-by-one in delta-island tree-depth computation Jeff King
@ 2018-11-20 15:06 ` Derrick Stolee
2018-11-22 16:43 ` Jeff King
3 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2018-11-20 15:06 UTC (permalink / raw)
To: Jeff King, git; +Cc: Christian Couder, Derrick Stolee
On 11/20/2018 4:44 AM, Jeff King wrote:
> In cases like this I think it's often a good idea to have a perf test.
> Those are expensive anyway, and we'd have the double benefit of
> exercising the code and showing off the performance improvement. But the
> delta-island code only makes sense on a very specialized repo: one where
> you have multiple related but diverging histories. You could simulate
> that by picking two branches in a repository, but the effect is going to
> be miniscule.
The changes in this series look correct. Too bad it is difficult to test.
Perhaps we should add a performance test for the --delta-islands check
that would trigger the failure (and maybe a clone afterwards). There are
lots of freely available forks of git.git that present interesting fork
structure. Here are three that would suffice to make this interesting:
https://github.com/git/git
https://github.com/git-for-windows/git
https://github.com/microsoft/git
Of course, running it on a specific repo is up to the person running the
perf suite.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants
2018-11-20 9:46 ` [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Jeff King
@ 2018-11-20 16:37 ` Duy Nguyen
2018-11-22 16:50 ` Jeff King
2018-11-21 4:52 ` Junio C Hamano
1 sibling, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2018-11-20 16:37 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List, Christian Couder, Derrick Stolee
On Tue, Nov 20, 2018 at 11:04 AM Jeff King <peff@peff.net> wrote:
>
> Commit 108f530385 (pack-objects: move tree_depth into 'struct
> packing_data', 2018-08-16) dynamically manages a tree_depth array in
> packing_data that maintains one of these invariants:
>
> 1. tree_depth is NULL (i.e., the requested options don't require us to
> track tree depths)
>
> 2. tree_depth is non-NULL and has as many entries as the "objects"
> array
>
> We maintain (2) by:
>
> a. When the objects array grows, grow tree_depth to the same size
> (unless it's NULL, in which case we can leave it).
>
> b. When a caller asks to set a depth via oe_set_tree_depth(), if
> tree_depth is NULL we allocate it.
>
> But in (b), we use the number of stored objects, _not_ the allocated
> size of the objects array. So we can run into a situation like this:
>
> 1. packlist_alloc() needs to store the Nth object, so it grows the
> objects array to M, where M > N.
>
> 2. oe_set_tree_depth() wants to store a depth, so it allocates an
> array of length N. Now we've violated our invariant.
>
> 3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
> grow the objects array, since N <= M still holds. We try to assign
> to tree_depth[N+1], which is out of bounds.
Do you think if this splitting data to packing_data is too fragile
that we should just scrape the whole thing and move all data back to
object_entry[]? We would use more memory of course but higher memory
usage is still better than more bugs (if these are likely to show up
again).
--
Duy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants
2018-11-20 9:46 ` [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Jeff King
2018-11-20 16:37 ` Duy Nguyen
@ 2018-11-21 4:52 ` Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2018-11-21 4:52 UTC (permalink / raw)
To: Jeff King; +Cc: git, Christian Couder, Derrick Stolee
Jeff King <peff@peff.net> writes:
> But in (b), we use the number of stored objects, _not_ the allocated
> size of the objects array. So we can run into a situation like this:
>
> 1. packlist_alloc() needs to store the Nth object, so it grows the
> objects array to M, where M > N.
>
> 2. oe_set_tree_depth() wants to store a depth, so it allocates an
> array of length N. Now we've violated our invariant.
>
> 3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
> grow the objects array, since N <= M still holds. We try to assign
> to tree_depth[N+1], which is out of bounds.
Ouch. I see counting and allocationg is hard (I think I spotted a
bug in another area that comes from the same "count while filtering
and then allocate" pattern during this cycle). Thanks for spotting.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] delta-island fixes
2018-11-20 15:06 ` [PATCH 0/3] delta-island fixes Derrick Stolee
@ 2018-11-22 16:43 ` Jeff King
0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2018-11-22 16:43 UTC (permalink / raw)
To: Derrick Stolee; +Cc: git, Christian Couder, Derrick Stolee
On Tue, Nov 20, 2018 at 10:06:57AM -0500, Derrick Stolee wrote:
> On 11/20/2018 4:44 AM, Jeff King wrote:
> > In cases like this I think it's often a good idea to have a perf test.
> > Those are expensive anyway, and we'd have the double benefit of
> > exercising the code and showing off the performance improvement. But the
> > delta-island code only makes sense on a very specialized repo: one where
> > you have multiple related but diverging histories. You could simulate
> > that by picking two branches in a repository, but the effect is going to
> > be miniscule.
>
> The changes in this series look correct. Too bad it is difficult to test.
>
> Perhaps we should add a performance test for the --delta-islands check that
> would trigger the failure (and maybe a clone afterwards). There are lots of
> freely available forks of git.git that present interesting fork structure.
> Here are three that would suffice to make this interesting:
>
> https://github.com/git/git
> https://github.com/git-for-windows/git
> https://github.com/microsoft/git
>
> Of course, running it on a specific repo is up to the person running the
> perf suite.
I hadn't really considered the possibility of reconstructing a fork
network repository from public repos. It probably would be possible to
include a script which does so, although:
- I suspect it's going to be pretty expensive. We can use --reference
to reduce the size of subsequent clones, but just the repacks you
have to do to assemble the final shared repo can get pretty
expensive.
- That's 3 forks. Our real-world case has over 4000. I'm not sure of
the size of the effects for smaller cases.
So in short, I think it's an interesting avenue to explore, but it might
turn out to be a dead-end. I'm not going to prioritize it right now, but
if somebody wants to play with it, I'd be happy to look over the
results.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants
2018-11-20 16:37 ` Duy Nguyen
@ 2018-11-22 16:50 ` Jeff King
0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2018-11-22 16:50 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Git Mailing List, Christian Couder, Derrick Stolee
On Tue, Nov 20, 2018 at 05:37:18PM +0100, Duy Nguyen wrote:
> > But in (b), we use the number of stored objects, _not_ the allocated
> > size of the objects array. So we can run into a situation like this:
> >
> > 1. packlist_alloc() needs to store the Nth object, so it grows the
> > objects array to M, where M > N.
> >
> > 2. oe_set_tree_depth() wants to store a depth, so it allocates an
> > array of length N. Now we've violated our invariant.
> >
> > 3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
> > grow the objects array, since N <= M still holds. We try to assign
> > to tree_depth[N+1], which is out of bounds.
>
> Do you think if this splitting data to packing_data is too fragile
> that we should just scrape the whole thing and move all data back to
> object_entry[]? We would use more memory of course but higher memory
> usage is still better than more bugs (if these are likely to show up
> again).
Certainly that thought crossed my mind while working on these patches. :)
Especially given the difficulties it introduced into the recent
bitmap-reuse topic, and the size fixes we had to deal with in v2.19.
Overall, though, I dunno. This fix, while subtle, turned out not to be
too complicated. And the memory savings are real. I consider 100M
objects to be on the large size of feasible for stock Git these days,
and I think we are talking about on the order of 4GB memory savings
there. You need a big machine to handle a repository of that size, but
4GB is still appreciable.
So I guess at this point, with all (known) bugs fixed, we should stick
with it for now. If it becomes a problem for development of a future
feature, then we can re-evaluate then.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-11-22 16:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20 9:44 [PATCH 0/3] delta-island fixes Jeff King
2018-11-20 9:46 ` [PATCH 1/3] pack-objects: fix tree_depth and layer invariants Jeff King
2018-11-20 16:37 ` Duy Nguyen
2018-11-22 16:50 ` Jeff King
2018-11-21 4:52 ` Junio C Hamano
2018-11-20 9:48 ` [PATCH 2/3] pack-objects: zero-initialize tree_depth/layer arrays Jeff King
2018-11-20 9:50 ` [PATCH 3/3] pack-objects: fix off-by-one in delta-island tree-depth computation Jeff King
2018-11-20 15:06 ` [PATCH 0/3] delta-island fixes Derrick Stolee
2018-11-22 16:43 ` Jeff King
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).