git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 6/9] reftable/reader: iterate to next block in place
Date: Wed, 27 Mar 2024 07:37:17 +0100	[thread overview]
Message-ID: <ae359cb714faa550b585af4a002ad84b01c2b576.1711519925.git.ps@pks.im> (raw)
In-Reply-To: <cover.1711519925.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4330 bytes --]

The table iterator has to iterate towards the next block once it has
yielded all records of the current block. This is done by creating a new
table iterator, initializing it to the next block, releasing the old
iterator and then copying over the data.

Refactor the code to instead advance the table iterator in place. This
is simpler and unlocks some optimizations in subsequent patches. Also,
it allows us to avoid some allocations.

The following measurements show a single matching ref out of 1 million
refs. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  |  2 ++
 reftable/reader.c | 47 ++++++++++++++++++++++++++---------------------
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 8f5dfe10bf..471ebd8580 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -188,6 +188,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	uint8_t *restart_bytes = NULL;
 	uint8_t *uncompressed = NULL;
 
+	reftable_block_done(&br->block);
+
 	if (!reftable_is_block_type(typ)) {
 		err =  REFTABLE_FORMAT_ERROR;
 		goto done;
diff --git a/reftable/reader.c b/reftable/reader.c
index b77b639751..dd4de294a1 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -312,26 +312,20 @@ static void table_iter_close(struct table_iter *ti)
 	block_iter_close(&ti->bi);
 }
 
-static int table_iter_next_block(struct table_iter *dest,
-				 struct table_iter *src)
+static int table_iter_next_block(struct table_iter *ti)
 {
-	uint64_t next_block_off = src->block_off + src->br.full_block_size;
+	uint64_t next_block_off = ti->block_off + ti->br.full_block_size;
 	int err;
 
-	dest->r = src->r;
-	dest->typ = src->typ;
-	dest->block_off = next_block_off;
-
-	err = reader_init_block_reader(src->r, &dest->br, next_block_off, src->typ);
+	err = reader_init_block_reader(ti->r, &ti->br, next_block_off, ti->typ);
 	if (err > 0)
-		dest->is_finished = 1;
-	if (err) {
-		table_iter_block_done(dest);
+		ti->is_finished = 1;
+	if (err)
 		return err;
-	}
 
-	dest->is_finished = 0;
-	block_iter_seek_start(&dest->bi, &dest->br);
+	ti->block_off = next_block_off;
+	ti->is_finished = 0;
+	block_iter_seek_start(&ti->bi, &ti->br);
 
 	return 0;
 }
@@ -342,7 +336,6 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		return REFTABLE_API_ERROR;
 
 	while (1) {
-		struct table_iter next = TABLE_ITER_INIT;
 		int err;
 
 		if (ti->is_finished)
@@ -362,14 +355,11 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		 * table and retry. If there are no more blocks then the
 		 * iterator is drained.
 		 */
-		err = table_iter_next_block(&next, ti);
+		err = table_iter_next_block(ti);
 		if (err) {
 			ti->is_finished = 1;
 			return err;
 		}
-
-		table_iter_close(ti);
-		*ti = next;
 	}
 }
 
@@ -453,9 +443,24 @@ static int reader_seek_linear(struct table_iter *ti,
 	 * have no other way to do this.
 	 */
 	while (1) {
-		struct table_iter next = TABLE_ITER_INIT;
+		struct table_iter next = *ti;
+
+		/*
+		 * We must be careful to not modify underlying data of `ti`
+		 * because we may find that `next` does not contain our desired
+		 * block, but that `ti` does. In that case, we would discard
+		 * `next` and continue with `ti`.
+		 *
+		 * This also means that we cannot reuse allocated memory for
+		 * `next` here. While it would be great if we could, it should
+		 * in practice not be too bad given that we should only ever
+		 * end up doing linear seeks with at most three blocks. As soon
+		 * as we have more than three blocks we would have an index, so
+		 * we would not do a linear search there anymore.
+		 */
+		memset(&next.br.block, 0, sizeof(next.br.block));
 
-		err = table_iter_next_block(&next, ti);
+		err = table_iter_next_block(&next);
 		if (err < 0)
 			goto done;
 		if (err > 0)
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-03-27  6:38 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-03 13:16   ` Karthik Nayak
2024-04-08 12:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-03  4:52   ` Justin Tobler
2024-04-03 13:10     ` Patrick Steinhardt
2024-03-27  6:37 ` Patrick Steinhardt [this message]
2024-03-27  6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-10 10:15     ` Karthik Nayak
2024-04-08 12:17   ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
2024-04-09  1:29     ` Justin Tobler
2024-04-09  3:18       ` Patrick Steinhardt
2024-04-09  1:32   ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
2024-04-10 11:35   ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae359cb714faa550b585af4a002ad84b01c2b576.1711519925.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).