git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/2] t3305: check notes fanout more carefully and robustly
@ 2020-02-03 21:04 Johan Herland
  2020-02-03 21:04 ` [PATCH 2/2] notes.c: fix off-by-one error when decreasing notes fanout Johan Herland
  0 siblings, 1 reply; 4+ messages in thread
From: Johan Herland @ 2020-02-03 21:04 UTC (permalink / raw)
  To: git; +Cc: Johan Herland, Johannes Schindelin, Brian M . Carlson

In short, before this patch, this test script:
 - creates many notes
 - verifies that all notes in the notes tree has a fanout of 1
 - removes most notes
 - verifies that the notes in the notes tree now has a fanout of 0

The fanout verification only happened twice: after creating all the
notes, and after removing most of them.

This patch strengthens the test by checking the fanout after _each_
added/removed note: We assert that the switch from fanout 0 -> 1
happens exactly once while adding notes (and that the switch pervades
the entire notes tree). Likewise, we assert that the switch from
fanout 1 -> 0 happens exactly once while removing notes.

Additionally, we decrease the number of notes left after removal,
from 50 to 15 notes, in order to ensure that fanout 1 -> 0 transition
keeps happening regardless of external factors[1].

[1]: Currently (with the SHA1 hash function and the deterministic
object ids of the test environment) the fanout heuristic in the notes
code happens to switch from 0 -> 1 at 109 notes, and from 1 -> 0 at
59 notes. However, changing the hash function or other external
factors will vary these numbers, and the latter may - in theory - go
as low as 15. For more details, please see the discussion at
https://public-inbox.org/git/20200125230035.136348-4-sandals@crustytoothpaste.net/

Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Brian M. Carlson <sandals@crustytoothpaste.net>
Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3305-notes-fanout.sh | 101 ++++++++++++++++++++++++++++++----------
 1 file changed, 76 insertions(+), 25 deletions(-)

diff --git a/t/t3305-notes-fanout.sh b/t/t3305-notes-fanout.sh
index 831f83d211..402057c83a 100755
--- a/t/t3305-notes-fanout.sh
+++ b/t/t3305-notes-fanout.sh
@@ -4,6 +4,32 @@ test_description='Test that adding/removing many notes triggers automatic fanout
 
 . ./test-lib.sh
 
+path_has_fanout() {
+	path=$1 &&
+	fanout=$2 &&
+	after_last_slash=$((40 - $fanout * 2)) &&
+	echo $path | grep -q "^\([0-9a-f]\{2\}/\)\{$fanout\}[0-9a-f]\{$after_last_slash\}$"
+}
+
+touched_one_note_with_fanout() {
+	notes_commit=$1 &&
+	modification=$2 &&  # 'A' for addition, 'D' for deletion
+	fanout=$3 &&
+	diff=$(git diff-tree --no-commit-id --name-status --root -r $notes_commit) &&
+	path=$(echo $diff | sed -e "s/^$modification[\t ]//") &&
+	path_has_fanout "$path" $fanout;
+}
+
+all_notes_have_fanout() {
+	notes_commit=$1 &&
+	fanout=$2 &&
+	git ls-tree -r --name-only $notes_commit 2>/dev/null |
+	while read path
+	do
+		path_has_fanout $path $fanout || return 1
+	done
+}
+
 test_expect_success 'creating many notes with git-notes' '
 	num_notes=300 &&
 	i=0 &&
@@ -20,7 +46,7 @@ test_expect_success 'creating many notes with git-notes' '
 
 test_expect_success 'many notes created correctly with git-notes' '
 	git log | grep "^    " > output &&
-	i=300 &&
+	i=$num_notes &&
 	while test $i -gt 0
 	do
 		echo "    commit #$i" &&
@@ -30,34 +56,46 @@ test_expect_success 'many notes created correctly with git-notes' '
 	test_cmp expect output
 '
 
-test_expect_success 'many notes created with git-notes triggers fanout' '
-	# Expect entire notes tree to have a fanout == 1
-	git ls-tree -r --name-only refs/notes/commits |
-	while read path
+test_expect_success 'stable fanout 0 is followed by stable fanout 1' '
+	i=$num_notes &&
+	fanout=0 &&
+	while test $i -gt 0
 	do
-		echo $path | grep "^../[0-9a-f]*$" || {
-			echo "Invalid path \"$path\"" &&
-			return 1;
-		}
-	done
+		i=$(($i - 1)) &&
+		if touched_one_note_with_fanout refs/notes/commits~$i A $fanout
+		then
+			continue
+		elif test $fanout -eq 0
+		then
+			fanout=1 &&
+			if all_notes_have_fanout refs/notes/commits~$i $fanout
+			then
+				echo "Fanout 0 -> 1 at refs/notes/commits~$i" &&
+				continue
+			fi
+		fi &&
+		echo "Failed fanout=$fanout check at refs/notes/commits~$i" &&
+		git ls-tree -r --name-only refs/notes/commits~$i &&
+		return 1
+	done &&
+	all_notes_have_fanout refs/notes/commits 1
 '
 
 test_expect_success 'deleting most notes with git-notes' '
-	num_notes=250 &&
+	remove_notes=285 &&
 	i=0 &&
 	git rev-list HEAD |
-	while test $i -lt $num_notes && read sha1
+	while test $i -lt $remove_notes && read sha1
 	do
 		i=$(($i + 1)) &&
 		test_tick &&
-		git notes remove "$sha1" ||
-		exit 1
+		git notes remove "$sha1" 2>/dev/null || return 1
 	done
 '
 
 test_expect_success 'most notes deleted correctly with git-notes' '
-	git log HEAD~250 | grep "^    " > output &&
-	i=50 &&
+	git log HEAD~$remove_notes | grep "^    " > output &&
+	i=$(($num_notes - $remove_notes)) &&
 	while test $i -gt 0
 	do
 		echo "    commit #$i" &&
@@ -67,16 +105,29 @@ test_expect_success 'most notes deleted correctly with git-notes' '
 	test_cmp expect output
 '
 
-test_expect_success 'deleting most notes triggers fanout consolidation' '
-	# Expect entire notes tree to have a fanout == 0
-	git ls-tree -r --name-only refs/notes/commits |
-	while read path
+test_expect_success 'stable fanout 1 is followed by stable fanout 0' '
+	i=$remove_notes &&
+	fanout=1 &&
+	while test $i -gt 0
 	do
-		echo $path | grep -v "^../.*" || {
-			echo "Invalid path \"$path\"" &&
-			return 1;
-		}
-	done
+		i=$(($i - 1)) &&
+		if touched_one_note_with_fanout refs/notes/commits~$i D $fanout
+		then
+			continue
+		elif test $fanout -eq 1
+		then
+			fanout=0 &&
+			if all_notes_have_fanout refs/notes/commits~$i $fanout
+			then
+				echo "Fanout 1 -> 0 at refs/notes/commits~$i" &&
+				continue
+			fi
+		fi &&
+		echo "Failed fanout=$fanout check at refs/notes/commits~$i" &&
+		git ls-tree -r --name-only refs/notes/commits~$i &&
+		return 1
+	done &&
+	all_notes_have_fanout refs/notes/commits 0
 '
 
 test_done
-- 
2.23.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] notes.c: fix off-by-one error when decreasing notes fanout
  2020-02-03 21:04 [PATCH 1/2] t3305: check notes fanout more carefully and robustly Johan Herland
@ 2020-02-03 21:04 ` Johan Herland
  2020-02-04  1:05   ` brian m. carlson
  0 siblings, 1 reply; 4+ messages in thread
From: Johan Herland @ 2020-02-03 21:04 UTC (permalink / raw)
  To: git; +Cc: Johan Herland, Johannes Schindelin, Brian M . Carlson

As noted in the previous commit, the nature of the fanout heuristic
in the notes code causes the exact point at which we increase or
decrease the notes fanout to vary with the objects being annotated.
Since the object ids generated by the test environment are
deterministic (by design), the notes generated and tested by t3305
are always the same, and we therefore happen to see the same fanout
behavior from one run to the next.

Coincidentally, if we were to change the test environment slightly
(say by making a test commit on an unrelated branch before we start
the t3305 test proper), we not only see the fanout switch happen at
different points, we also manage to trigger a _bug_ in the notes
code where the fanout 1 -> 0 switch is not applied uniformly across
the notes tree, but instead yields a notes tree like this:

  ...
  bdeafb301e44b0e4db0f738a2d2a7beefdb70b70
  bff2d39b4f7122bd4c5caee3de353a774d1e632a
  d3/8ec8f851adf470131178085bfbaab4b12ad2a7
  e0b173960431a3e692ae929736df3c9b73a11d5b
  eb3c3aede523d729990ac25c62a93eb47c21e2e3
  ...

The bug occurs when we are writing out a notes tree with a newly
decreased fanout, and the notes tree contains unexpanded subtrees
that should be consolidated into the parent tree as a consequence of
the decreased fanout):

Subtrees that happen to sit at an _even_ level in the internal notes
16-tree structure (in other words: subtrees whose path - "d3" in the
example above - is unique in the first nibble - i.e. there are no
other note paths that start with "d") are _not_ unpacked as part of
the tree writeout. This error will repeat itself in subsequent note
trees until the subtree is forced to be unpacked. In t3305 this only
happens when the d38ec8f8 note is itself removed from the tree.

The error is not severe (no information is lost, and the notes code
is able to read/decode this tree and manipulate it correctly), but
this is nonetheless a bug in the current implementation that should
be fixed.

That said, fixing the off-by-one error is not without complications:
We must take into account that the load_subtree() call from
for_each_note_helper() (that is now done to correctly unpack the
subtree while we're writing out the notes tree) may end up inserting
unpacked non-notes into the linked list of non_note entries held by
the struct notes_tree. Since we are in the process of writing out the
notes tree, this linked list is currently in the process of being
traversed by write_each_non_note_until(). The unpacked non-notes are
necessarily inserted between the last non-note we wrote out, and the
next non-note to be written. Hence, we cannot simply hold the
next_non_note to write in struct write_each_note_data (as we would
then silently skip these newly inserted notes), but must instead
always follow the ->next pointer from the last non-note we wrote.
(This part is was caught by an existing test in t3304.)

Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Brian M. Carlson <sandals@crustytoothpaste.net>
Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c                 | 20 ++++++++++++--------
 t/t3305-notes-fanout.sh |  6 ++++++
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/notes.c b/notes.c
index 0c79964c26..2de7f4bcfb 100644
--- a/notes.c
+++ b/notes.c
@@ -576,16 +576,16 @@ static int for_each_note_helper(struct notes_tree *t, struct int_node *tree,
 			 * the note tree that have not yet been explored. There
 			 * is a direct relationship between subtree entries at
 			 * level 'n' in the tree, and the 'fanout' variable:
-			 * Subtree entries at level 'n <= 2 * fanout' should be
+			 * Subtree entries at level 'n < 2 * fanout' should be
 			 * preserved, since they correspond exactly to a fanout
 			 * directory in the on-disk structure. However, subtree
-			 * entries at level 'n > 2 * fanout' should NOT be
+			 * entries at level 'n >= 2 * fanout' should NOT be
 			 * preserved, but rather consolidated into the above
 			 * notes tree level. We achieve this by unconditionally
 			 * unpacking subtree entries that exist below the
 			 * threshold level at 'n = 2 * fanout'.
 			 */
-			if (n <= 2 * fanout &&
+			if (n < 2 * fanout &&
 			    flags & FOR_EACH_NOTE_YIELD_SUBTREES) {
 				/* invoke callback with subtree */
 				unsigned int path_len =
@@ -602,7 +602,7 @@ static int for_each_note_helper(struct notes_tree *t, struct int_node *tree,
 					 path,
 					 cb_data);
 			}
-			if (n > fanout * 2 ||
+			if (n >= 2 * fanout ||
 			    !(flags & FOR_EACH_NOTE_DONT_UNPACK_SUBTREES)) {
 				/* unpack subtree and resume traversal */
 				tree->a[i] = NULL;
@@ -723,13 +723,15 @@ static int write_each_note_helper(struct tree_write_stack *tws,
 
 struct write_each_note_data {
 	struct tree_write_stack *root;
-	struct non_note *next_non_note;
+	struct non_note **nn_list;
+	struct non_note *nn_prev;
 };
 
 static int write_each_non_note_until(const char *note_path,
 		struct write_each_note_data *d)
 {
-	struct non_note *n = d->next_non_note;
+	struct non_note *p = d->nn_prev;
+	struct non_note *n = p ? p->next : *d->nn_list;
 	int cmp = 0, ret;
 	while (n && (!note_path || (cmp = strcmp(n->path, note_path)) <= 0)) {
 		if (note_path && cmp == 0)
@@ -740,9 +742,10 @@ static int write_each_non_note_until(const char *note_path,
 			if (ret)
 				return ret;
 		}
+		p = n;
 		n = n->next;
 	}
-	d->next_non_note = n;
+	d->nn_prev = p;
 	return 0;
 }
 
@@ -1177,7 +1180,8 @@ int write_notes_tree(struct notes_tree *t, struct object_id *result)
 	strbuf_init(&root.buf, 256 * (32 + the_hash_algo->hexsz)); /* assume 256 entries */
 	root.path[0] = root.path[1] = '\0';
 	cb_data.root = &root;
-	cb_data.next_non_note = t->first_non_note;
+	cb_data.nn_list = &(t->first_non_note);
+	cb_data.nn_prev = NULL;
 
 	/* Write tree objects representing current notes tree */
 	flags = FOR_EACH_NOTE_DONT_UNPACK_SUBTREES |
diff --git a/t/t3305-notes-fanout.sh b/t/t3305-notes-fanout.sh
index 402057c83a..3b4753e1b4 100755
--- a/t/t3305-notes-fanout.sh
+++ b/t/t3305-notes-fanout.sh
@@ -30,6 +30,12 @@ all_notes_have_fanout() {
 	done
 }
 
+test_expect_success 'tweak test environment' '
+	git checkout -b nondeterminism &&
+	test_commit A &&
+	git checkout --orphan with_notes;
+'
+
 test_expect_success 'creating many notes with git-notes' '
 	num_notes=300 &&
 	i=0 &&
-- 
2.23.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] notes.c: fix off-by-one error when decreasing notes fanout
  2020-02-03 21:04 ` [PATCH 2/2] notes.c: fix off-by-one error when decreasing notes fanout Johan Herland
@ 2020-02-04  1:05   ` brian m. carlson
  2020-02-05 17:16     ` Johan Herland
  0 siblings, 1 reply; 4+ messages in thread
From: brian m. carlson @ 2020-02-04  1:05 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

On 2020-02-03 at 21:04:45, Johan Herland wrote:
> As noted in the previous commit, the nature of the fanout heuristic
> in the notes code causes the exact point at which we increase or
> decrease the notes fanout to vary with the objects being annotated.
> Since the object ids generated by the test environment are
> deterministic (by design), the notes generated and tested by t3305
> are always the same, and we therefore happen to see the same fanout
> behavior from one run to the next.
> 
> Coincidentally, if we were to change the test environment slightly
> (say by making a test commit on an unrelated branch before we start
> the t3305 test proper), we not only see the fanout switch happen at
> different points, we also manage to trigger a _bug_ in the notes
> code where the fanout 1 -> 0 switch is not applied uniformly across
> the notes tree, but instead yields a notes tree like this:
> 
>   ...
>   bdeafb301e44b0e4db0f738a2d2a7beefdb70b70
>   bff2d39b4f7122bd4c5caee3de353a774d1e632a
>   d3/8ec8f851adf470131178085bfbaab4b12ad2a7
>   e0b173960431a3e692ae929736df3c9b73a11d5b
>   eb3c3aede523d729990ac25c62a93eb47c21e2e3
>   ...
> 
> The bug occurs when we are writing out a notes tree with a newly
> decreased fanout, and the notes tree contains unexpanded subtrees
> that should be consolidated into the parent tree as a consequence of
> the decreased fanout):
> 
> Subtrees that happen to sit at an _even_ level in the internal notes
> 16-tree structure (in other words: subtrees whose path - "d3" in the
> example above - is unique in the first nibble - i.e. there are no
> other note paths that start with "d") are _not_ unpacked as part of
> the tree writeout. This error will repeat itself in subsequent note
> trees until the subtree is forced to be unpacked. In t3305 this only
> happens when the d38ec8f8 note is itself removed from the tree.
> 
> The error is not severe (no information is lost, and the notes code
> is able to read/decode this tree and manipulate it correctly), but
> this is nonetheless a bug in the current implementation that should
> be fixed.
> 
> That said, fixing the off-by-one error is not without complications:
> We must take into account that the load_subtree() call from
> for_each_note_helper() (that is now done to correctly unpack the
> subtree while we're writing out the notes tree) may end up inserting
> unpacked non-notes into the linked list of non_note entries held by
> the struct notes_tree. Since we are in the process of writing out the
> notes tree, this linked list is currently in the process of being
> traversed by write_each_non_note_until(). The unpacked non-notes are
> necessarily inserted between the last non-note we wrote out, and the
> next non-note to be written. Hence, we cannot simply hold the
> next_non_note to write in struct write_each_note_data (as we would
> then silently skip these newly inserted notes), but must instead
> always follow the ->next pointer from the last non-note we wrote.
> (This part is was caught by an existing test in t3304.)

I think you have "is was" here.  You probably want one or the other.

> Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Cc: Brian M. Carlson <sandals@crustytoothpaste.net>

I generally write my name in lower case, but I think typically we prefer
to omit Cc lines in patches (unlike LKML), so it may just be better to
remove these lines.

> Signed-off-by: Johan Herland <johan@herland.net>

Patch 1 looked good to me.  Your explanation here makes sense, but I
must admit that I still don't understand this code, so I can't give an
outright approval.  I do appreciate that it comes with a test, though.

I haven't tested, but I expect this series will make Dscho's patch
unnecessary, so I'll drop it in my reroll unless one of you tells me to
keep it.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] notes.c: fix off-by-one error when decreasing notes fanout
  2020-02-04  1:05   ` brian m. carlson
@ 2020-02-05 17:16     ` Johan Herland
  0 siblings, 0 replies; 4+ messages in thread
From: Johan Herland @ 2020-02-05 17:16 UTC (permalink / raw)
  To: brian m. carlson, Johan Herland, Git mailing list,
	Johannes Schindelin

On Tue, Feb 4, 2020 at 2:05 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On 2020-02-03 at 21:04:45, Johan Herland wrote:
> > always follow the ->next pointer from the last non-note we wrote.
> > (This part is was caught by an existing test in t3304.)
>
> I think you have "is was" here.  You probably want one or the other.

Will fix.

> > Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> > Cc: Brian M. Carlson <sandals@crustytoothpaste.net>
>
> I generally write my name in lower case, but I think typically we prefer
> to omit Cc lines in patches (unlike LKML), so it may just be better to
> remove these lines.

Ack. Will remove.

> > Signed-off-by: Johan Herland <johan@herland.net>
>
> Patch 1 looked good to me.  Your explanation here makes sense, but I
> must admit that I still don't understand this code, so I can't give an
> outright approval.  I do appreciate that it comes with a test, though.

Thanks for having a look. I must admit it's hard to get back into this
code even though I originally wrote it. I've been trying to prepare a
patch #3 which decreases the fanout more aggressively on notes removal
(having a fanout of 1 in a tree of 17 notes is  bit ridiculous, IMHO),
but I'm not yet able to figure something out that behaves in a stable
manner. (I find scenarios where removing a note will switch fanout
from 1 to 0, but removing another note will then switch fanout from 0
back to 1, and so on, and the current notes code does not have these
problems, AFAICS.)

> I haven't tested, but I expect this series will make Dscho's patch
> unnecessary, so I'll drop it in my reroll unless one of you tells me to
> keep it.

Yes, my patch includes Dscho's change. I don't particularly care
whichever lands first, and I can easily rebase on top of yours.

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-05 17:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 21:04 [PATCH 1/2] t3305: check notes fanout more carefully and robustly Johan Herland
2020-02-03 21:04 ` [PATCH 2/2] notes.c: fix off-by-one error when decreasing notes fanout Johan Herland
2020-02-04  1:05   ` brian m. carlson
2020-02-05 17:16     ` Johan Herland

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).