git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure
@ 2017-03-26  1:52 Mike Hommey
  2017-04-07 14:21 ` Johan Herland
  0 siblings, 1 reply; 2+ messages in thread
From: Mike Hommey @ 2017-03-26  1:52 UTC (permalink / raw)
  To: git; +Cc: johan, gitster

After a note is removed, note_tree_consolidate is called to eliminate
some useless nodes. The typical case is that if you had an int_node
with 2 PTR_TYPE_NOTEs in it, and remove one of them, then the
PTR_TYPE_INTERNAL pointer in the parent tree can be replaced with the
remaining PTR_TYPE_NOTE.

This works fine when PTR_TYPE_NOTEs are involved, but falls flat when
other types are involved.

To put things in more practical terms, let's say we start from an empty
notes tree, and add 3 notes:
- one for a sha1 that starts with 424
- one for a sha1 that starts with 428
- one for a sha1 that starts with 4c

To keep track of this, note_tree.root will have a PTR_TYPE_INTERNAL at
a[4], pointing to an int_node*.
In turn, that int_node* will have a PTR_TYPE_NOTE at a[0xc], pointing to
the leaf_node* with the key and value, and a PTR_TYPE_INTERNAL at a[2],
pointing to another int_node*.
That other int_node* will have 2 PTR_TYPE_NOTE, one at a[4] and the
other at a[8].

When looking for the note for the sha1 starting with 428, get_note() will
recurse through (simplified) root.a[4].a[2].a[8].

Now, if we remove the note for the sha1 that starts with 4c, we're left
with a int_node* with only one PTR_TYPE_INTERNAL entry in it. After
note_tree_consolidate runs, root.a[4] now points to what used to be
pointed at by root.a[4].a[2].

Which means looking up for the note for the sha1 starting with 428 now
fails because there is nothing at root.a[4].a[2] anymore: there is only
root.a[4].a[4] and root.a[4].a[8], which don't match the expected
structure for the lookup.

So if all there is left in an int_node* is a PTR_TYPE_INTERNAL pointer,
we can't safely remove it. I think the same applies for PTR_TYPE_SUBTREE
pointers. IOW, only PTR_TYPE_NOTEs are safe to be moved to the parent
int_node*.

This doesn't have a practical effect on git because all that happens
after a remove_note is a write_notes_tree, which just iterates the entire
note tree, but this affects anything using libgit.a that would try to do
lookups after removing notes.

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 notes.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/notes.c b/notes.c
index 2bab961ac..542563b28 100644
--- a/notes.c
+++ b/notes.c
@@ -153,8 +153,8 @@ static struct leaf_node *note_tree_find(struct notes_tree *t,
  * How to consolidate an int_node:
  * If there are > 1 non-NULL entries, give up and return non-zero.
  * Otherwise replace the int_node at the given index in the given parent node
- * with the only entry (or a NULL entry if no entries) from the given tree,
- * and return 0.
+ * with the only NOTE entry (or a NULL entry if no entries) from the given
+ * tree, and return 0.
  */
 static int note_tree_consolidate(struct int_node *tree,
 	struct int_node *parent, unsigned char index)
@@ -173,6 +173,8 @@ static int note_tree_consolidate(struct int_node *tree,
 		}
 	}
 
+	if (p && (GET_PTR_TYPE(p) != PTR_TYPE_NOTE))
+		return -2;
 	/* replace tree with p in parent[index] */
 	parent->a[index] = p;
 	free(tree);
-- 
2.12.1.dirty


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure
  2017-03-26  1:52 [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure Mike Hommey
@ 2017-04-07 14:21 ` Johan Herland
  0 siblings, 0 replies; 2+ messages in thread
From: Johan Herland @ 2017-04-07 14:21 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Git mailing list, Junio C Hamano

On Sun, Mar 26, 2017 at 3:52 AM, Mike Hommey <mh@glandium.org> wrote:
> After a note is removed, note_tree_consolidate is called to eliminate
> some useless nodes. The typical case is that if you had an int_node
> with 2 PTR_TYPE_NOTEs in it, and remove one of them, then the
> PTR_TYPE_INTERNAL pointer in the parent tree can be replaced with the
> remaining PTR_TYPE_NOTE.
>
> This works fine when PTR_TYPE_NOTEs are involved, but falls flat when
> other types are involved.
>
> To put things in more practical terms, let's say we start from an empty
> notes tree, and add 3 notes:
> - one for a sha1 that starts with 424
> - one for a sha1 that starts with 428
> - one for a sha1 that starts with 4c
>
> To keep track of this, note_tree.root will have a PTR_TYPE_INTERNAL at
> a[4], pointing to an int_node*.
> In turn, that int_node* will have a PTR_TYPE_NOTE at a[0xc], pointing to
> the leaf_node* with the key and value, and a PTR_TYPE_INTERNAL at a[2],
> pointing to another int_node*.
> That other int_node* will have 2 PTR_TYPE_NOTE, one at a[4] and the
> other at a[8].
>
> When looking for the note for the sha1 starting with 428, get_note() will
> recurse through (simplified) root.a[4].a[2].a[8].
>
> Now, if we remove the note for the sha1 that starts with 4c, we're left
> with a int_node* with only one PTR_TYPE_INTERNAL entry in it. After
> note_tree_consolidate runs, root.a[4] now points to what used to be
> pointed at by root.a[4].a[2].
>
> Which means looking up for the note for the sha1 starting with 428 now
> fails because there is nothing at root.a[4].a[2] anymore: there is only
> root.a[4].a[4] and root.a[4].a[8], which don't match the expected
> structure for the lookup.
>
> So if all there is left in an int_node* is a PTR_TYPE_INTERNAL pointer,
> we can't safely remove it.

Correct.

> I think the same applies for PTR_TYPE_SUBTREE pointers.

I disagree. The subtree nodes contain the entire key prefix for that subtree
in their .key_sha1, and before descending into a subtree (causing it to be
unpacked), we always prefix-compare the key we're looking for against the
subtree's .key_sha1. This behavior, as well as the subtree's .key_sha1 is
unaffected by which level of the notes tree hold the subtree node (provided
- obviously - that the location is along the prefix path).

Adding a variation to your example: Assume that the 424 and 428 nodes are
packed inside a subtree node. We then have: root.a[4] points to an int_node
whose .a[2] points to a subtree node, and .a[0xc] points to a note node. The
subtree node at root.a[4].a[2] represents a key-prefix of "42" (i.e.
its .key_sha1
equals [0x42, 0x0, ..., 0x0, 0x2]).

Now, after removing the 4c note, we consider consolidating the int_node at
root.a[4]. At this point, its only member is the single subtree entry at a[2].
By removing the int_node and moving the subtree node to root.a[4], we
have simply reorganized from

  root.a[4].a[2] -> subtree("42")

to

  root.a[4] -> subtree("42")

AFAICS, this layout is equally valid (and more optimal) than before we
consolidated. We will still always perform the complete prefix-compare
against "42" before descending into the subtree.

In summary, the bug that really needs fixing is the relocation of existing
int_nodes. Otherwise, leaf_nodes (whether they be PTR_TYPE_NOTE or
PTR_TYPE_SUBTREE) can be freely relocated within the tree as long as the
path leading to them (root.a[x].a[y].a[z]) prefix-matches their .key_sha1.

> IOW, only PTR_TYPE_NOTEs are safe to be moved to the parent
> int_node*.

This holds for both PTR_TYPE_NOTEs and PTR_TYPE_SUBTREEs, IMHO.

>
> This doesn't have a practical effect on git because all that happens
> after a remove_note is a write_notes_tree, which just iterates the entire
> note tree, but this affects anything using libgit.a that would try to do
> lookups after removing notes.
>
> Signed-off-by: Mike Hommey <mh@glandium.org>

Regardless of the PTR_TYPE_SUBTREE optimization discussed above,
I believe your patch is safe, and can be used as-is.

> ---
>  notes.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/notes.c b/notes.c
> index 2bab961ac..542563b28 100644
> --- a/notes.c
> +++ b/notes.c
> @@ -153,8 +153,8 @@ static struct leaf_node *note_tree_find(struct notes_tree *t,
>   * How to consolidate an int_node:
>   * If there are > 1 non-NULL entries, give up and return non-zero.
>   * Otherwise replace the int_node at the given index in the given parent node
> - * with the only entry (or a NULL entry if no entries) from the given tree,
> - * and return 0.
> + * with the only NOTE entry (or a NULL entry if no entries) from the given
> + * tree, and return 0.
>   */
>  static int note_tree_consolidate(struct int_node *tree,
>         struct int_node *parent, unsigned char index)
> @@ -173,6 +173,8 @@ static int note_tree_consolidate(struct int_node *tree,
>                 }
>         }
>
> +       if (p && (GET_PTR_TYPE(p) != PTR_TYPE_NOTE))
> +               return -2;

Based on the above discussion, I believe this can be rewritten to:

+       if (p && (GET_PTR_TYPE(p) == PTR_TYPE_INTERNAL))
+               return -2; /* Cannot move int_nodes within the tree. */

for a more optimal handling of subtree nodes in this scenario.


Have fun! :)

...Johan


>         /* replace tree with p in parent[index] */
>         parent->a[index] = p;
>         free(tree);
> --
> 2.12.1.dirty
>



-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-04-07 14:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-26  1:52 [PATCH] notes: Fix note_tree_consolidate not to break the note_tree structure Mike Hommey
2017-04-07 14:21 ` Johan Herland

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).