git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Justin Tobler via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Justin Tobler <jltobler@gmail.com>, Justin Tobler <jltobler@gmail.com>
Subject: [PATCH] reftable/stack: use geometric table compaction
Date: Tue, 05 Mar 2024 20:03:45 +0000	[thread overview]
Message-ID: <pull.1683.git.1709669025722.gitgitgadget@gmail.com> (raw)

From: Justin Tobler <jltobler@gmail.com>

To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.

Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:

git branch -f foo
git branch -d foo

Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.

To avoid unbounded growth of the table list, walk through each table and
evaluate if it needs to be included in the compaction segment to restore
a geometric sequence.

Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
    reftable/stack: use geometric table compaction

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1683

 reftable/stack.c           | 106 +++++++++++++++----------------------
 reftable/stack.h           |   3 --
 reftable/stack_test.c      |  66 +++++------------------
 t/t0610-reftable-basics.sh |  24 ++++-----
 4 files changed, 70 insertions(+), 129 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index b64e55648aa..e4ea8753977 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1214,75 +1214,57 @@ static int segment_size(struct segment *s)
 	return s->end - s->start;
 }
 
-int fastlog2(uint64_t sz)
-{
-	int l = 0;
-	if (sz == 0)
-		return 0;
-	for (; sz; sz /= 2) {
-		l++;
-	}
-	return l - 1;
-}
-
-struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n)
-{
-	struct segment *segs = reftable_calloc(n, sizeof(*segs));
-	struct segment cur = { 0 };
-	size_t next = 0, i;
-
-	if (n == 0) {
-		*seglen = 0;
-		return segs;
-	}
-	for (i = 0; i < n; i++) {
-		int log = fastlog2(sizes[i]);
-		if (cur.log != log && cur.bytes > 0) {
-			struct segment fresh = {
-				.start = i,
-			};
-
-			segs[next++] = cur;
-			cur = fresh;
-		}
-
-		cur.log = log;
-		cur.end = i + 1;
-		cur.bytes += sizes[i];
-	}
-	segs[next++] = cur;
-	*seglen = next;
-	return segs;
-}
-
 struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
 {
-	struct segment min_seg = {
-		.log = 64,
-	};
-	struct segment *segs;
-	size_t seglen = 0, i;
-
-	segs = sizes_to_segments(&seglen, sizes, n);
-	for (i = 0; i < seglen; i++) {
-		if (segment_size(&segs[i]) == 1)
-			continue;
+	struct segment seg = { 0 };
+	uint64_t bytes;
+	size_t i;
 
-		if (segs[i].log < min_seg.log)
-			min_seg = segs[i];
-	}
+	/*
+	 * If there are no tables or only a single one then we don't have to
+	 * compact anything. The sequence is geometric by definition already.
+	 */
+	if (n <= 1)
+		return seg;
 
-	while (min_seg.start > 0) {
-		size_t prev = min_seg.start - 1;
-		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev]))
+	/*
+	 * Find the ending table of the compaction segment needed to restore the
+	 * geometric sequence.
+	 *
+	 * To do so, we iterate backwards starting from the most recent table
+	 * until a valid segment end is found. If the preceding table is smaller
+	 * than the current table multiplied by the geometric factor (2), the
+	 * current table is set as the compaction segment end.
+	 */
+	for (i = n - 1; i > 0; i--) {
+		if (sizes[i - 1] < sizes[i] * 2) {
+			seg.end = i;
+			bytes = sizes[i];
 			break;
+		}
+	}
+
+	/*
+	 * Find the starting table of the compaction segment by iterating
+	 * through the remaing tables and keeping track of the accumulated size
+	 * of all tables seen from the segment end table.
+	 *
+	 * Note that we keep iterating even after we have found the first
+	 * first starting point. This is because there may be tables in the
+	 * stack preceding that first starting point which violate the geometric
+	 * sequence.
+	 */
+	for (; i > 0; i--) {
+		uint64_t curr = bytes;
+		bytes += sizes[i - 1];
 
-		min_seg.start = prev;
-		min_seg.bytes += sizes[prev];
+		if (sizes[i - 1] < curr * 2) {
+			seg.start = i - 1;
+			seg.bytes = bytes;
+		}
 	}
 
-	reftable_free(segs);
-	return min_seg;
+	return seg;
 }
 
 static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
@@ -1305,7 +1287,7 @@ int reftable_stack_auto_compact(struct reftable_stack *st)
 		suggest_compaction_segment(sizes, st->merged->stack_len);
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
-		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+		return stack_compact_range_stats(st, seg.start, seg.end,
 						 NULL);
 
 	return 0;
diff --git a/reftable/stack.h b/reftable/stack.h
index d919455669e..656f896cc28 100644
--- a/reftable/stack.h
+++ b/reftable/stack.h
@@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines);
 
 struct segment {
 	size_t start, end;
-	int log;
 	uint64_t bytes;
 };
 
-int fastlog2(uint64_t sz);
-struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n);
 struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
 
 #endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 509f4866236..85600a9573e 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void)
 	clear_dir(dir);
 }
 
-static void test_log2(void)
-{
-	EXPECT(1 == fastlog2(3));
-	EXPECT(2 == fastlog2(4));
-	EXPECT(2 == fastlog2(5));
-}
-
-static void test_sizes_to_segments(void)
-{
-	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
-	/* .................0  1  2  3  4  5 */
-
-	size_t seglen = 0;
-	struct segment *segs =
-		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
-	EXPECT(segs[2].log == 3);
-	EXPECT(segs[2].start == 5);
-	EXPECT(segs[2].end == 6);
-
-	EXPECT(segs[1].log == 2);
-	EXPECT(segs[1].start == 2);
-	EXPECT(segs[1].end == 5);
-	reftable_free(segs);
-}
-
-static void test_sizes_to_segments_empty(void)
-{
-	size_t seglen = 0;
-	struct segment *segs = sizes_to_segments(&seglen, NULL, 0);
-	EXPECT(seglen == 0);
-	reftable_free(segs);
-}
-
-static void test_sizes_to_segments_all_equal(void)
-{
-	uint64_t sizes[] = { 5, 5 };
-	size_t seglen = 0;
-	struct segment *segs =
-		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
-	EXPECT(seglen == 1);
-	EXPECT(segs[0].start == 0);
-	EXPECT(segs[0].end == 2);
-	reftable_free(segs);
-}
-
 static void test_suggest_compaction_segment(void)
 {
-	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
 	/* .................0    1    2  3   4  5  6 */
 	struct segment min =
 		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
-	EXPECT(min.start == 2);
-	EXPECT(min.end == 7);
+	EXPECT(min.start == 1);
+	EXPECT(min.end == 9);
 }
 
 static void test_suggest_compaction_segment_nothing(void)
@@ -884,6 +839,17 @@ static void test_empty_add(void)
 	reftable_stack_destroy(st2);
 }
 
+static int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
 static void test_reftable_stack_auto_compaction(void)
 {
 	struct reftable_write_options cfg = { 0 };
@@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
 int stack_test_main(int argc, const char *argv[])
 {
 	RUN_TEST(test_empty_add);
-	RUN_TEST(test_log2);
 	RUN_TEST(test_names_equal);
 	RUN_TEST(test_parse_names);
 	RUN_TEST(test_read_file);
@@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[])
 	RUN_TEST(test_reftable_stack_update_index_check);
 	RUN_TEST(test_reftable_stack_uptodate);
 	RUN_TEST(test_reftable_stack_validate_refname);
-	RUN_TEST(test_sizes_to_segments);
-	RUN_TEST(test_sizes_to_segments_all_equal);
-	RUN_TEST(test_sizes_to_segments_empty);
 	RUN_TEST(test_suggest_compaction_segment);
 	RUN_TEST(test_suggest_compaction_segment_nothing);
 	return 0;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 6a131e40b81..a3b1a04123e 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' '
 	test_line_count = 1 repo/.git/reftable/tables.list &&
 
 	test_commit -C repo --no-tag A &&
-	test_line_count = 2 repo/.git/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
 
 	test_commit -C repo --no-tag B &&
 	test_line_count = 1 repo/.git/reftable/tables.list
@@ -324,7 +324,7 @@ test_expect_success 'ref transaction: writes are synced' '
 		git -C repo -c core.fsync=reference \
 		-c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
 	check_fsync_events trace2.txt <<-EOF
-	"name":"hardware-flush","count":2
+	"name":"hardware-flush","count":4
 	EOF
 '
 
@@ -334,8 +334,8 @@ test_expect_success 'pack-refs: compacts tables' '
 
 	test_commit -C repo A &&
 	ls -1 repo/.git/reftable >table-files &&
-	test_line_count = 4 table-files &&
-	test_line_count = 3 repo/.git/reftable/tables.list &&
+	test_line_count = 3 table-files &&
+	test_line_count = 2 repo/.git/reftable/tables.list &&
 
 	git -C repo pack-refs &&
 	ls -1 repo/.git/reftable >table-files &&
@@ -367,7 +367,7 @@ do
 			umask $umask &&
 			git init --shared=true repo &&
 			test_commit -C repo A &&
-			test_line_count = 3 repo/.git/reftable/tables.list
+			test_line_count = 2 repo/.git/reftable/tables.list
 		) &&
 		git -C repo pack-refs &&
 		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
@@ -737,10 +737,10 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' '
 	test_commit -C repo A &&
 	git -C repo worktree add ../worktree &&
 
-	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 4 repo/.git/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
 	git -C repo pack-refs &&
-	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
 	test_line_count = 1 repo/.git/reftable/tables.list
 '
 
@@ -750,11 +750,11 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
 	test_commit -C repo A &&
 	git -C repo worktree add ../worktree &&
 
-	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 4 repo/.git/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
 	git -C worktree pack-refs &&
 	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 4 repo/.git/reftable/tables.list
+	test_line_count = 1 repo/.git/reftable/tables.list
 '
 
 test_expect_success 'worktree: creating shared ref updates main stack' '
@@ -770,7 +770,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' '
 
 	git -C worktree update-ref refs/heads/shared HEAD &&
 	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
-	test_line_count = 2 repo/.git/reftable/tables.list
+	test_line_count = 1 repo/.git/reftable/tables.list
 '
 
 test_expect_success 'worktree: creating per-worktree ref updates worktree stack' '

base-commit: b387623c12f3f4a376e4d35a610fd3e55d7ea907
-- 
gitgitgadget


             reply	other threads:[~2024-03-05 20:05 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 20:03 Justin Tobler via GitGitGadget [this message]
2024-03-06 12:30 ` [PATCH] reftable/stack: use geometric table compaction Patrick Steinhardt
2024-03-06 12:37 ` Patrick Steinhardt
2024-03-21 22:48   ` Justin Tobler
2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget
2024-03-21 22:40   ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-03-22  1:25     ` Patrick Steinhardt
2024-03-21 22:40   ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-03-22  1:25     ` Patrick Steinhardt
2024-03-27 13:24     ` Karthik Nayak
2024-03-21 22:40   ` [PATCH v2 3/3] reftable/segment: make segment end inclusive Justin Tobler via GitGitGadget
2024-03-22  1:25   ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt
2024-04-03 10:13     ` Han-Wen Nienhuys
2024-04-03 10:18       ` Patrick Steinhardt
2024-04-03 15:14         ` Justin Tobler
2024-04-03 16:40         ` Junio C Hamano
2024-03-29  4:16   ` [PATCH v3 " Justin Tobler via GitGitGadget
2024-03-29  4:16     ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-03-29 18:25       ` Junio C Hamano
2024-03-29 21:56       ` Junio C Hamano
2024-04-02  7:23       ` Patrick Steinhardt
2024-04-02 17:23         ` Junio C Hamano
2024-03-29  4:16     ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-02  7:23       ` Patrick Steinhardt
2024-03-29  4:16     ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget
2024-03-29 18:36       ` Junio C Hamano
2024-04-02  7:23         ` Patrick Steinhardt
2024-04-03  0:20     ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-03  0:20       ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-03  0:20       ` [PATCH v4 2/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-03  4:47       ` [PATCH v4 0/2] " Patrick Steinhardt
2024-04-03 11:12       ` Karthik Nayak
2024-04-03 16:56         ` Junio C Hamano
2024-04-04 18:29       ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget
2024-04-04 18:29         ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget
2024-04-08  6:12           ` Patrick Steinhardt
2024-04-04 18:29         ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-08  6:12           ` Patrick Steinhardt
2024-04-08 16:18             ` Junio C Hamano
2024-04-04 18:29         ` [PATCH v5 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-08  6:12         ` [PATCH v5 0/3] " Patrick Steinhardt
2024-04-08 16:17           ` Justin Tobler
2024-04-08 16:16         ` [PATCH v6 " Justin Tobler via GitGitGadget
2024-04-08 16:16           ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget
2024-04-08 16:16           ` [PATCH v6 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-08 16:16           ` [PATCH v6 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-08 16:20           ` [PATCH v6 0/3] " Patrick Steinhardt
2024-04-08 19:12             ` Junio C Hamano
2024-04-03 19:12   ` [PATCH v2 " Junio C Hamano
2024-04-03 19:30     ` Patrick Steinhardt
2024-04-04  5:34       ` Patrick Steinhardt
2024-04-04 18:28         ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1683.git.1709669025722.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).