* [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure
@ 2017-03-26 1:52 Mike Hommey
2017-04-07 14:21 ` Johan Herland
0 siblings, 1 reply; 2+ messages in thread
From: Mike Hommey @ 2017-03-26 1:52 UTC (permalink / raw)
To: git; +Cc: johan, gitster
After a note is removed, note_tree_consolidate is called to eliminate
some useless nodes. The typical case is that if you had an int_node
with 2 PTR_TYPE_NOTEs in it, and remove one of them, then the
PTR_TYPE_INTERNAL pointer in the parent tree can be replaced with the
remaining PTR_TYPE_NOTE.
This works fine when PTR_TYPE_NOTEs are involved, but falls flat when
other types are involved.
To put things in more practical terms, let's say we start from an empty
notes tree, and add 3 notes:
- one for a sha1 that starts with 424
- one for a sha1 that starts with 428
- one for a sha1 that starts with 4c
To keep track of this, note_tree.root will have a PTR_TYPE_INTERNAL at
a[4], pointing to an int_node*.
In turn, that int_node* will have a PTR_TYPE_NOTE at a[0xc], pointing to
the leaf_node* with the key and value, and a PTR_TYPE_INTERNAL at a[2],
pointing to another int_node*.
That other int_node* will have 2 PTR_TYPE_NOTE, one at a[4] and the
other at a[8].
When looking for the note for the sha1 starting with 428, get_note() will
recurse through (simplified) root.a[4].a[2].a[8].
Now, if we remove the note for the sha1 that starts with 4c, we're left
with a int_node* with only one PTR_TYPE_INTERNAL entry in it. After
note_tree_consolidate runs, root.a[4] now points to what used to be
pointed at by root.a[4].a[2].
Which means looking up for the note for the sha1 starting with 428 now
fails because there is nothing at root.a[4].a[2] anymore: there is only
root.a[4].a[4] and root.a[4].a[8], which don't match the expected
structure for the lookup.
So if all there is left in an int_node* is a PTR_TYPE_INTERNAL pointer,
we can't safely remove it. I think the same applies for PTR_TYPE_SUBTREE
pointers. IOW, only PTR_TYPE_NOTEs are safe to be moved to the parent
int_node*.
This doesn't have a practical effect on git because all that happens
after a remove_note is a write_notes_tree, which just iterates the entire
note tree, but this affects anything using libgit.a that would try to do
lookups after removing notes.
Signed-off-by: Mike Hommey <mh@glandium.org>
---
notes.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/notes.c b/notes.c
index 2bab961ac..542563b28 100644
--- a/notes.c
+++ b/notes.c
@@ -153,8 +153,8 @@ static struct leaf_node *note_tree_find(struct notes_tree *t,
* How to consolidate an int_node:
* If there are > 1 non-NULL entries, give up and return non-zero.
* Otherwise replace the int_node at the given index in the given parent node
- * with the only entry (or a NULL entry if no entries) from the given tree,
- * and return 0.
+ * with the only NOTE entry (or a NULL entry if no entries) from the given
+ * tree, and return 0.
*/
static int note_tree_consolidate(struct int_node *tree,
struct int_node *parent, unsigned char index)
@@ -173,6 +173,8 @@ static int note_tree_consolidate(struct int_node *tree,
}
}
+ if (p && (GET_PTR_TYPE(p) != PTR_TYPE_NOTE))
+ return -2;
/* replace tree with p in parent[index] */
parent->a[index] = p;
free(tree);
--
2.12.1.dirty
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure
2017-03-26 1:52 [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure Mike Hommey
@ 2017-04-07 14:21 ` Johan Herland
0 siblings, 0 replies; 2+ messages in thread
From: Johan Herland @ 2017-04-07 14:21 UTC (permalink / raw)
To: Mike Hommey; +Cc: Git mailing list, Junio C Hamano
On Sun, Mar 26, 2017 at 3:52 AM, Mike Hommey <mh@glandium.org> wrote:
> After a note is removed, note_tree_consolidate is called to eliminate
> some useless nodes. The typical case is that if you had an int_node
> with 2 PTR_TYPE_NOTEs in it, and remove one of them, then the
> PTR_TYPE_INTERNAL pointer in the parent tree can be replaced with the
> remaining PTR_TYPE_NOTE.
>
> This works fine when PTR_TYPE_NOTEs are involved, but falls flat when
> other types are involved.
>
> To put things in more practical terms, let's say we start from an empty
> notes tree, and add 3 notes:
> - one for a sha1 that starts with 424
> - one for a sha1 that starts with 428
> - one for a sha1 that starts with 4c
>
> To keep track of this, note_tree.root will have a PTR_TYPE_INTERNAL at
> a[4], pointing to an int_node*.
> In turn, that int_node* will have a PTR_TYPE_NOTE at a[0xc], pointing to
> the leaf_node* with the key and value, and a PTR_TYPE_INTERNAL at a[2],
> pointing to another int_node*.
> That other int_node* will have 2 PTR_TYPE_NOTE, one at a[4] and the
> other at a[8].
>
> When looking for the note for the sha1 starting with 428, get_note() will
> recurse through (simplified) root.a[4].a[2].a[8].
>
> Now, if we remove the note for the sha1 that starts with 4c, we're left
> with a int_node* with only one PTR_TYPE_INTERNAL entry in it. After
> note_tree_consolidate runs, root.a[4] now points to what used to be
> pointed at by root.a[4].a[2].
>
> Which means looking up for the note for the sha1 starting with 428 now
> fails because there is nothing at root.a[4].a[2] anymore: there is only
> root.a[4].a[4] and root.a[4].a[8], which don't match the expected
> structure for the lookup.
>
> So if all there is left in an int_node* is a PTR_TYPE_INTERNAL pointer,
> we can't safely remove it.
Correct.
> I think the same applies for PTR_TYPE_SUBTREE pointers.
I disagree. The subtree nodes contain the entire key prefix for that subtree
in their .key_sha1, and before descending into a subtree (causing it to be
unpacked), we always prefix-compare the key we're looking for against the
subtree's .key_sha1. This behavior, as well as the subtree's .key_sha1 is
unaffected by which level of the notes tree hold the subtree node (provided
- obviously - that the location is along the prefix path).
Adding a variation to your example: Assume that the 424 and 428 nodes are
packed inside a subtree node. We then have: root.a[4] points to an int_node
whose .a[2] points to a subtree node, and .a[0xc] points to a note node. The
subtree node at root.a[4].a[2] represents a key-prefix of "42" (i.e.
its .key_sha1
equals [0x42, 0x0, ..., 0x0, 0x2]).
Now, after removing the 4c note, we consider consolidating the int_node at
root.a[4]. At this point, its only member is the single subtree entry at a[2].
By removing the int_node and moving the subtree node to root.a[4], we
have simply reorganized from
root.a[4].a[2] -> subtree("42")
to
root.a[4] -> subtree("42")
AFAICS, this layout is equally valid (and more optimal) than before we
consolidated. We will still always perform the complete prefix-compare
against "42" before descending into the subtree.
In summary, the bug that really needs fixing is the relocation of existing
int_nodes. Otherwise, leaf_nodes (whether they be PTR_TYPE_NOTE or
PTR_TYPE_SUBTREE) can be freely relocated within the tree as long as the
path leading to them (root.a[x].a[y].a[z]) prefix-matches their .key_sha1.
> IOW, only PTR_TYPE_NOTEs are safe to be moved to the parent
> int_node*.
This holds for both PTR_TYPE_NOTEs and PTR_TYPE_SUBTREEs, IMHO.
>
> This doesn't have a practical effect on git because all that happens
> after a remove_note is a write_notes_tree, which just iterates the entire
> note tree, but this affects anything using libgit.a that would try to do
> lookups after removing notes.
>
> Signed-off-by: Mike Hommey <mh@glandium.org>
Regardless of the PTR_TYPE_SUBTREE optimization discussed above,
I believe your patch is safe, and can be used as-is.
> ---
> notes.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/notes.c b/notes.c
> index 2bab961ac..542563b28 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -153,8 +153,8 @@ static struct leaf_node *note_tree_find(struct notes_tree *t,
> * How to consolidate an int_node:
> * If there are > 1 non-NULL entries, give up and return non-zero.
> * Otherwise replace the int_node at the given index in the given parent node
> - * with the only entry (or a NULL entry if no entries) from the given tree,
> - * and return 0.
> + * with the only NOTE entry (or a NULL entry if no entries) from the given
> + * tree, and return 0.
> */
> static int note_tree_consolidate(struct int_node *tree,
> struct int_node *parent, unsigned char index)
> @@ -173,6 +173,8 @@ static int note_tree_consolidate(struct int_node *tree,
> }
> }
>
> + if (p && (GET_PTR_TYPE(p) != PTR_TYPE_NOTE))
> + return -2;
Based on the above discussion, I believe this can be rewritten to:
+ if (p && (GET_PTR_TYPE(p) == PTR_TYPE_INTERNAL))
+ return -2; /* Cannot move int_nodes within the tree. */
for a more optimal handling of subtree nodes in this scenario.
Have fun! :)
...Johan
> /* replace tree with p in parent[index] */
> parent->a[index] = p;
> free(tree);
> --
> 2.12.1.dirty
>
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-04-07 14:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-26 1:52 [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure Mike Hommey
2017-04-07 14:21 ` Johan Herland
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).