git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Han-Wen Nienhuys <hanwenn@gmail.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek
Date: Mon, 8 Apr 2024 14:17:08 +0200	[thread overview]
Message-ID: <cc5ff0d5988691043206f9e912f5ffa1bcfee94e.1712578376.git.ps@pks.im> (raw)
In-Reply-To: <cover.1712578376.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4548 bytes --]

When seeking a reftable record in a block we need to position the
iterator _before_ the sought-after record so that the next call to
`block_iter_next()` would yield that record. To achieve this, the loop
that performs the linear needs to restore the previous position once it
has found the record.

This is done by advancing two `block_iter`s: one to check whether the
next record is our sought-after record, and one that we update after
every iteration. This of course involves quite a lot of copying and also
leads to needless memory allocations.

Refactor the code to get rid of the `next` iterator and the copying this
involves. Instead, we can restore the previous offset such that the call
to `next` will return the correct record.

Next to being simpler conceptually this also leads to a nice speedup.
The following benchmark parser 10k refs out of 100k existing refs via
`git-rev-list --no-walk`:

  Benchmark 1: rev-list: print many refs (HEAD~)
    Time (mean ± σ):     170.2 ms ±   1.7 ms    [User: 86.1 ms, System: 83.6 ms]
    Range (min … max):   166.4 ms … 180.3 ms    500 runs

  Benchmark 2: rev-list: print many refs (HEAD~)
    Time (mean ± σ):     161.6 ms ±   1.6 ms    [User: 78.1 ms, System: 83.0 ms]
    Range (min … max):   158.4 ms … 172.3 ms    500 runs

  Summary
    rev-list: print many refs (HEAD) ran
      1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 32 ++++++++++++++------------------
 reftable/block.h |  2 --
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index c6c4a68ea1..3e87460cba 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -365,16 +365,6 @@ static int restart_needle_less(size_t idx, void *_args)
 	return args->needle.len < suffix_len;
 }
 
-void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src)
-{
-	dest->block = src->block;
-	dest->block_len = src->block_len;
-	dest->hash_size = src->hash_size;
-	dest->next_off = src->next_off;
-	strbuf_reset(&dest->last_key);
-	strbuf_addbuf(&dest->last_key, &src->last_key);
-}
-
 int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 {
 	struct string_view in = {
@@ -427,7 +417,6 @@ int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 		.needle = *want,
 		.reader = br,
 	};
-	struct block_iter next = BLOCK_ITER_INIT;
 	struct reftable_record rec;
 	int err = 0;
 	size_t i;
@@ -486,11 +475,13 @@ int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 	 * far and then back up.
 	 */
 	while (1) {
-		block_iter_copy_from(&next, it);
-		err = block_iter_next(&next, &rec);
+		size_t prev_off = it->next_off;
+
+		err = block_iter_next(it, &rec);
 		if (err < 0)
 			goto done;
 		if (err > 0) {
+			it->next_off = prev_off;
 			err = 0;
 			goto done;
 		}
@@ -501,18 +492,23 @@ int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 		 * record does not exist in the block and can thus abort early.
 		 * In case it is equal to the sought-after key we have found
 		 * the desired record.
+		 *
+		 * Note that we store the next record's key record directly in
+		 * `last_key` without restoring the key of the preceding record
+		 * in case we need to go one record back. This is safe to do as
+		 * `block_iter_next()` would return the ref whose key is equal
+		 * to `last_key` now, and naturally all keys share a prefix
+		 * with themselves.
 		 */
 		reftable_record_key(&rec, &it->last_key);
-		if (strbuf_cmp(&it->last_key, want) >= 0)
+		if (strbuf_cmp(&it->last_key, want) >= 0) {
+			it->next_off = prev_off;
 			goto done;
-
-		block_iter_copy_from(it, &next);
+		}
 	}
 
 done:
-	block_iter_close(&next);
 	reftable_record_release(&rec);
-
 	return err;
 }
 
diff --git a/reftable/block.h b/reftable/block.h
index c1bd1892cb..ea4384a7e2 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -121,8 +121,6 @@ void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
 int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 			struct strbuf *want);
 
-void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
-
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
 int block_iter_next(struct block_iter *it, struct reftable_record *rec);
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-04-08 12:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-03 13:16   ` Karthik Nayak
2024-04-08 12:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-03  4:52   ` Justin Tobler
2024-04-03 13:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 6/9] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-10 10:15     ` Karthik Nayak
2024-04-08 12:17   ` Patrick Steinhardt [this message]
2024-04-09  1:29     ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Justin Tobler
2024-04-09  3:18       ` Patrick Steinhardt
2024-04-09  1:32   ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
2024-04-10 11:35   ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc5ff0d5988691043206f9e912f5ffa1bcfee94e.1712578376.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=hanwenn@gmail.com \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).