git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 3/7] reftable/record: avoid copying author info
Date: Tue, 5 Mar 2024 13:11:03 +0100	[thread overview]
Message-ID: <6f568e4ccb67a7af8279352153d052c5f9a88234.1709640322.git.ps@pks.im> (raw)
In-Reply-To: <cover.1709640322.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3516 bytes --]

Each reflog entry contains information regarding the authorship of who
has made the change. This authorship information is not the same as that
of any of the commits that the reflog entry references, but instead
corresponds to the local user that has executed the command. Thus, it is
almost always the case that all reflog entries have the same author.

We can make use of this fact when decoding reftable records: instead of
freeing and then reallocating the authorship information of log records,
we can special-case when the next record during an iteration has the
exact same authorship as the preceding record. If so, then there is no
need to reallocate the respective fields.

This change results in two allocations less per log record that we're
iterating over in the most common case. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated

An alternative would be to store the capacity of both name and email and
then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array.
But reftable records are copied around quite a lot, and thus we need to
be a bit mindful of the overall record size. Furthermore, a memory
comparison should also be more efficient than having to copy over memory
even if we wouldn't have to allocate a new array every time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index 8d2cd6b081..d816de6d93 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -896,10 +896,19 @@ static int reftable_log_record_decode(void *rec, struct strbuf key,
 		goto done;
 	string_view_consume(&in, n);
 
-	r->value.update.name =
-		reftable_realloc(r->value.update.name, dest.len + 1);
-	memcpy(r->value.update.name, dest.buf, dest.len);
-	r->value.update.name[dest.len] = 0;
+	/*
+	 * In almost all cases we can expect the reflog name to not change for
+	 * reflog entries as they are tied to the local identity, not to the
+	 * target commits. As an optimization for this common case we can thus
+	 * skip copying over the name in case it's accurate already.
+	 */
+	if (!r->value.update.name ||
+	    strcmp(r->value.update.name, dest.buf)) {
+		r->value.update.name =
+			reftable_realloc(r->value.update.name, dest.len + 1);
+		memcpy(r->value.update.name, dest.buf, dest.len);
+		r->value.update.name[dest.len] = 0;
+	}
 
 	strbuf_reset(&dest);
 	n = decode_string(&dest, in);
@@ -907,10 +916,14 @@ static int reftable_log_record_decode(void *rec, struct strbuf key,
 		goto done;
 	string_view_consume(&in, n);
 
-	r->value.update.email =
-		reftable_realloc(r->value.update.email, dest.len + 1);
-	memcpy(r->value.update.email, dest.buf, dest.len);
-	r->value.update.email[dest.len] = 0;
+	/* Same as above, but for the reflog email. */
+	if (!r->value.update.email ||
+	    strcmp(r->value.update.email, dest.buf)) {
+		r->value.update.email =
+			reftable_realloc(r->value.update.email, dest.len + 1);
+		memcpy(r->value.update.email, dest.buf, dest.len);
+		r->value.update.email[dest.len] = 0;
+	}
 
 	ts = 0;
 	n = get_var_int(&ts, &in);
-- 
2.44.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-03-05 12:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 12:10 [PATCH 0/7] reftable: memory optimizations for reflog iteration Patrick Steinhardt
2024-03-05 12:10 ` [PATCH 1/7] refs/reftable: reload correct stack when creating reflog iter Patrick Steinhardt
2024-03-06 16:13   ` Karthik Nayak
2024-03-06 17:49     ` Junio C Hamano
2024-03-07  6:00     ` Patrick Steinhardt
2024-03-11 18:34   ` Josh Steadmon
2024-03-11 23:24     ` Patrick Steinhardt
2024-03-05 12:10 ` [PATCH 2/7] reftable/record: convert old and new object IDs to arrays Patrick Steinhardt
2024-03-05 12:11 ` Patrick Steinhardt [this message]
2024-03-13  1:09   ` [PATCH 3/7] reftable/record: avoid copying author info James Liu
2024-03-21 13:10     ` Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 4/7] reftable/record: reuse refnames when decoding log records Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 5/7] reftable/record: reuse message " Patrick Steinhardt
2024-03-05 12:11 ` [PATCH 6/7] reftable/record: use scratch buffer when decoding records Patrick Steinhardt
2024-03-11 19:31   ` Josh Steadmon
2024-03-11 23:25     ` Patrick Steinhardt
2024-03-11 19:40   ` Josh Steadmon
2024-03-05 12:11 ` [PATCH 7/7] refs/reftable: track last log record name via strbuf Patrick Steinhardt
2024-03-11 19:41 ` [PATCH 0/7] reftable: memory optimizations for reflog iteration Josh Steadmon
2024-03-11 23:25   ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f568e4ccb67a7af8279352153d052c5f9a88234.1709640322.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).