git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 7/7] refs/reftable: track last log record name via strbuf
Date: Tue, 5 Mar 2024 13:11:20 +0100	[thread overview]
Message-ID: <f56dd2177e42927dd4cfa88c2adc344f42dbefac.1709640322.git.ps@pks.im> (raw)
In-Reply-To: <cover.1709640322.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]

The reflog iterator enumerates all reflogs known to a ref backend. In
the "reftable" backend there is no way to list all existing reflogs
directly. Instead, we have to iterate through all reflog entries and
discard all those redundant entries for which we have already returned a
reflog entry.

This logic is implemented by tracking the last reflog name that we have
emitted to the iterator's user. If the next log record has the same name
we simply skip it until we find another record with a different refname.

This last reflog name is stored in a simple C string, which requires us
to free and reallocate it whenever we need to update the reflog name.
Convert it to use a `struct strbuf` instead, which reduces the number of
allocations. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 68,485 allocs, 68,363 frees, 256,234,072 bytes allocated

Note that even after this change we still allocate quite a lot of data,
even though the number of allocations does not scale with the number of
log records anymore. This remainder comes mostly from decompressing the
log blocks, where we decompress each block into newly allocated memory.
This will be addressed at a later point in time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2de57c047a..12960d93ff 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1575,7 +1575,7 @@ struct reftable_reflog_iterator {
 	struct reftable_ref_store *refs;
 	struct reftable_iterator iter;
 	struct reftable_log_record log;
-	char *last_name;
+	struct strbuf last_name;
 	int err;
 };
 
@@ -1594,15 +1594,15 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		 * we've already produced this name. This could be faster by
 		 * seeking directly to reflog@update_index==0.
 		 */
-		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
+		if (!strcmp(iter->log.refname, iter->last_name.buf))
 			continue;
 
 		if (check_refname_format(iter->log.refname,
 					 REFNAME_ALLOW_ONELEVEL))
 			continue;
 
-		free(iter->last_name);
-		iter->last_name = xstrdup(iter->log.refname);
+		strbuf_reset(&iter->last_name);
+		strbuf_addstr(&iter->last_name, iter->log.refname);
 		iter->base.refname = iter->log.refname;
 
 		break;
@@ -1635,7 +1635,7 @@ static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
-	free(iter->last_name);
+	strbuf_release(&iter->last_name);
 	free(iter);
 	return ITER_DONE;
 }
@@ -1655,6 +1655,7 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
+	strbuf_init(&iter->last_name, 0);
 	iter->refs = refs;
 
 	ret = refs->err;
-- 
2.44.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-03-05 12:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 12:10 [PATCH 0/7] reftable: memory optimizations for reflog iteration Patrick Steinhardt
2024-03-05 12:10 ` [PATCH 1/7] refs/reftable: reload correct stack when creating reflog iter Patrick Steinhardt
2024-03-06 16:13   ` Karthik Nayak
2024-03-06 17:49     ` Junio C Hamano
2024-03-07  6:00     ` Patrick Steinhardt
2024-03-11 18:34   ` Josh Steadmon
2024-03-11 23:24     ` Patrick Steinhardt
2024-03-05 12:10 ` [PATCH 2/7] reftable/record: convert old and new object IDs to arrays Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 3/7] reftable/record: avoid copying author info Patrick Steinhardt
2024-03-13  1:09   ` James Liu
2024-03-21 13:10     ` Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 4/7] reftable/record: reuse refnames when decoding log records Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 5/7] reftable/record: reuse message " Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 6/7] reftable/record: use scratch buffer when decoding records Patrick Steinhardt
2024-03-11 19:31   ` Josh Steadmon
2024-03-11 23:25     ` Patrick Steinhardt
2024-03-11 19:40   ` Josh Steadmon
2024-03-05 12:11 ` Patrick Steinhardt [this message]
2024-03-11 19:41 ` [PATCH 0/7] reftable: memory optimizations for reflog iteration Josh Steadmon
2024-03-11 23:25   ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f56dd2177e42927dd4cfa88c2adc344f42dbefac.1709640322.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).