From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Han-Wen Nienhuys <hanwen@google.com>,
Karthik Nayak <karthik.188@gmail.com>
Subject: [PATCH v2 0/2] refs: introduce reftable backend
Date: Fri, 2 Feb 2024 09:38:25 +0100 [thread overview]
Message-ID: <cover.1706862705.git.ps@pks.im> (raw)
In-Reply-To: <cover.1706601199.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 18602 bytes --]
Hi,
this is the second version of my patch series that introduces the
reftable backend.
This version addresses the feedback by Karthik. I don't think it really
makes sense to spell out every change here -- the range diff should
likely be easier to digest.
Thanks!
Patrick
Patrick Steinhardt (2):
refs: introduce reftable backend
ci: add jobs to test with the reftable backend
.github/workflows/main.yml | 9 +
.gitlab-ci.yml | 9 +
Documentation/ref-storage-format.txt | 2 +
.../technical/repository-version.txt | 5 +-
Makefile | 1 +
ci/lib.sh | 2 +-
ci/run-build-and-tests.sh | 3 +
contrib/workdir/git-new-workdir | 2 +-
path.c | 2 +-
path.h | 1 +
refs.c | 1 +
refs/refs-internal.h | 1 +
refs/reftable-backend.c | 2295 +++++++++++++++++
repository.h | 5 +-
t/t0610-reftable-basics.sh | 887 +++++++
t/t0611-reftable-httpd.sh | 26 +
t/test-lib.sh | 2 +
17 files changed, 3246 insertions(+), 7 deletions(-)
create mode 100644 refs/reftable-backend.c
create mode 100755 t/t0610-reftable-basics.sh
create mode 100755 t/t0611-reftable-httpd.sh
Range-diff against v1:
1: 5598cd1307 ! 1: d6548dcfad refs: introduce reftable backend
@@ Commit message
previously been deleting reflogs together with their refs to avoid
file/directory conflicts, which is not necessary anymore.
+ - We can properly enumerate all refs. With the "files" backend it is
+ not easily possible to distinguish between refs and non-refs because
+ they may live side by side in the gitdir.
+
Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
@@ Commit message
workloads we would likely also want to perform pack loose refs,
which would likely change the picture.
- Benchmark 1: update-ref: create refs sequentially (refformat = files)
+ Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms]
Range (min … max): 1.8 ms … 4.3 ms 133 runs
- Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
+ Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms]
Range (min … max): 2.4 ms … 2.9 ms 132 runs
- Benchmark 3: update-ref: create refs sequentially (refformat = files)
+ Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s]
Range (min … max): 1.969 s … 1.980 s 3 runs
- Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
+ Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s]
Range (min … max): 2.597 s … 2.622 s 3 runs
- Benchmark 5: update-ref: create refs sequentially (refformat = files)
+ Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s]
Range (min … max): 198.189 s … 198.670 s 3 runs
- Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
+ Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s]
Range (min … max): 290.223 s … 298.761 s 3 runs
@@ refs/reftable-backend.c (new)
+#include "../git-compat-util.h"
+#include "../abspath.h"
+#include "../chdir-notify.h"
-+#include "../config.h"
+#include "../environment.h"
+#include "../gettext.h"
+#include "../hash.h"
@@ refs/reftable-backend.c (new)
+#include "../reftable/reftable-stack.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-error.h"
-+#include "../reftable/reftable-blocksource.h"
-+#include "../reftable/reftable-reader.h"
+#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-merged.h"
-+#include "../reftable/reftable-generic.h"
+#include "../setup.h"
+#include "../strmap.h"
-+#include "../worktree.h"
+#include "refs-internal.h"
+
+/*
@@ refs/reftable-backend.c (new)
+struct reftable_ref_store {
+ struct ref_store base;
+
++ /*
++ * The main stack refers to the common dir and thus contains common
++ * refs as well as refs of the main repository.
++ */
+ struct reftable_stack *main_stack;
++ /*
++ * The worktree stack refers to the gitdir in case the refdb is opened
++ * via a worktree. It thus contains the per-worktree refs.
++ */
+ struct reftable_stack *worktree_stack;
++ /*
++ * Map of worktree stacks by their respective worktree names. The map
++ * is populated lazily when we try to resolve `worktrees/$worktree` refs.
++ */
+ struct strmap worktree_stacks;
+ struct reftable_write_options write_options;
+
@@ refs/reftable-backend.c (new)
+ /*
+ * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
+ * This stack contains both the shared and the main worktree refs.
++ *
++ * Note that we don't try to resolve the path in case we have a
++ * worktree because `get_common_dir_noenv()` already does it for us.
+ */
+ is_worktree = get_common_dir_noenv(&path, gitdir);
+ if (!is_worktree) {
@@ refs/reftable-backend.c (new)
+ /*
+ * If we're in a worktree we also need to set up the worktree reftable
+ * stack that is contained in the per-worktree GIT_DIR.
++ *
++ * Ideally, we would also add the stack to our worktree stack map. But
++ * we have no way to figure out the worktree name here and thus can't
++ * do it efficiently.
+ */
+ if (is_worktree) {
+ strbuf_reset(&path);
@@ refs/reftable-backend.c (new)
+ }
+
+ /*
-+ * Otherwise, if we either have no worktree refs anymore or if
-+ * the common ref sorts before the next worktree ref, we need
-+ * to figure out whether the next common ref belongs to the
-+ * main worktree. In that case, it should be ignored.
++ * We now know that the lexicographically-next ref is a common
++ * ref. When the common ref is a shared one we return it.
+ */
+ if (parse_worktree_ref(iter_common->refname, NULL, NULL,
+ NULL) == REF_WORKTREE_SHARED)
+ return ITER_SELECT_1;
+
++ /*
++ * Otherwise, if the common ref is a per-worktree ref we skip
++ * it because it would belong to the main worktree, not ours.
++ */
+ return ITER_SKIP_1;
+ } else {
+ return ITER_DONE;
@@ refs/reftable-backend.c (new)
+ }
+
+ if (u->type & REF_ISSYMREF) {
++ /*
++ * The reftable stack is locked at this point already,
++ * so it is safe to call `refs_resolve_ref_unsafe()`
++ * here without causing races.
++ */
+ const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
+ ¤t_oid, NULL);
+
+ if (u->flags & REF_NO_DEREF) {
-+ /*
-+ * The reftable stack is locked at this point
-+ * already, so it should be safe to call
-+ * `refs_resolve_ref_unsafe()` here.
-+ */
+ if (u->flags & REF_HAVE_OLD && !resolved) {
+ strbuf_addf(err, _("cannot lock ref '%s': "
+ "error reading reference"), u->refname);
@@ refs/reftable-backend.c (new)
+ ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+ tombstone = &logs[logs_nr++];
+ tombstone->refname = xstrdup(u->refname);
-+ tombstone->value_type = REFTABLE_LOG_DELETION,
++ tombstone->value_type = REFTABLE_LOG_DELETION;
+ tombstone->update_index = log.update_index;
+ }
+
@@ refs/reftable-backend.c (new)
+
+ /*
+ * When deleting the old reference we have to use two update indices:
-+ * one to delete the old ref and its reflog, and once to create the new
-+ * ref and its reflog. They need to be staged with two separate indices
-+ * because the new reflog needs to encode both the deletion of the old
-+ * branch and the creation of the new branch, and we cannot do two
-+ * changes to a reflog in a single update.
++ * once to delete the old ref and its reflog, and once to create the
++ * new ref and its reflog. They need to be staged with two separate
++ * indices because the new reflog needs to encode both the deletion of
++ * the old branch and the creation of the new branch, and we cannot do
++ * two changes to a reflog in a single update.
+ */
+ deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
+ if (arg->delete_old)
@@ refs/reftable-backend.c (new)
+ struct reftable_reflog_iterator *iter =
+ (struct reftable_reflog_iterator *)ref_iterator;
+
-+ if (iter->err) {
-+ ref_iterator_abort(ref_iterator);
-+ return iter->err;
-+ }
-+
-+ while (1) {
-+ int flags, ret;
++ while (!iter->err) {
++ int flags;
+
-+ ret = reftable_iterator_next_log(&iter->iter, &iter->log);
-+ if (ret < 0) {
-+ ref_iterator_abort(ref_iterator);
-+ return ITER_ERROR;
-+ }
-+ if (ret > 0) {
-+ if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-+ return ITER_ERROR;
-+ return ITER_DONE;
-+ }
++ iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
++ if (iter->err)
++ break;
+
+ /*
+ * We want the refnames that we have reflogs for, so we skip if
@@ refs/reftable-backend.c (new)
+ iter->base.refname = iter->log.refname;
+ iter->base.oid = &iter->oid;
+ iter->base.flags = flags;
-+ return ITER_OK;
++
++ break;
+ }
++
++ if (iter->err > 0) {
++ if (ref_iterator_abort(ref_iterator) != ITER_DONE)
++ return ITER_ERROR;
++ return ITER_DONE;
++ }
++
++ if (iter->err < 0) {
++ ref_iterator_abort(ref_iterator);
++ return ITER_ERROR;
++ }
++
++ return ITER_OK;
+}
+
+static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
@@ refs/reftable-backend.c (new)
+static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
+ struct reftable_stack *stack)
+{
++ struct reftable_merged_table *merged_table;
+ struct reftable_reflog_iterator *iter;
-+ struct reftable_merged_table *mt;
+ int ret;
+
+ iter = xcalloc(1, sizeof(*iter));
@@ refs/reftable-backend.c (new)
+ iter->refs = refs;
+ iter->base.oid = &iter->oid;
+
++ ret = refs->err;
++ if (ret)
++ goto done;
++
+ ret = reftable_stack_reload(refs->main_stack);
+ if (ret < 0)
+ goto done;
+
-+ mt = reftable_stack_merged_table(stack);
-+ ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
++ merged_table = reftable_stack_merged_table(stack);
++
++ ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
+ if (ret < 0)
+ goto done;
+
@@ refs/reftable-backend.c (new)
+ iterator_select, NULL);
+}
+
++static int yield_log_record(struct reftable_log_record *log,
++ each_reflog_ent_fn fn,
++ void *cb_data)
++{
++ struct object_id old_oid, new_oid;
++ const char *full_committer;
++
++ oidread(&old_oid, log->value.update.old_hash);
++ oidread(&new_oid, log->value.update.new_hash);
++
++ /*
++ * When both the old object ID and the new object ID are null
++ * then this is the reflog existence marker. The caller must
++ * not be aware of it.
++ */
++ if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
++ return 0;
++
++ full_committer = fmt_ident(log->value.update.name, log->value.update.email,
++ WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
++ return fn(&old_oid, &new_oid, full_committer,
++ log->value.update.time, log->value.update.tz_offset,
++ log->value.update.message, cb_data);
++}
++
+static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+ const char *refname,
+ each_reflog_ent_fn fn,
@@ refs/reftable-backend.c (new)
+ mt = reftable_stack_merged_table(stack);
+ ret = reftable_merged_table_seek_log(mt, &it, refname);
+ while (!ret) {
-+ struct object_id old_oid, new_oid;
-+ const char *full_committer;
-+
+ ret = reftable_iterator_next_log(&it, &log);
+ if (ret < 0)
+ break;
@@ refs/reftable-backend.c (new)
+ break;
+ }
+
-+ oidread(&old_oid, log.value.update.old_hash);
-+ oidread(&new_oid, log.value.update.new_hash);
-+
-+ /*
-+ * When both the old object ID and the new object ID are null
-+ * then this is the reflog existence marker. The caller must
-+ * not be aware of it.
-+ */
-+ if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
-+ continue;
-+
-+ full_committer = fmt_ident(log.value.update.name, log.value.update.email,
-+ WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
-+ ret = fn(&old_oid, &new_oid, full_committer,
-+ log.value.update.time, log.value.update.tz_offset,
-+ log.value.update.message, cb_data);
++ ret = yield_log_record(&log, fn, cb_data);
+ if (ret)
+ break;
+ }
@@ refs/reftable-backend.c (new)
+
+ ret = reftable_iterator_next_log(&it, &log);
+ if (ret < 0)
-+ break;
++ goto done;
+ if (ret > 0 || strcmp(log.refname, refname)) {
+ reftable_log_record_release(&log);
+ ret = 0;
@@ refs/reftable-backend.c (new)
+ }
+
+ for (i = logs_nr; i--;) {
-+ struct reftable_log_record *log = &logs[i];
-+ struct object_id old_oid, new_oid;
-+ const char *full_committer = "";
-+
-+ oidread(&old_oid, log->value.update.old_hash);
-+ oidread(&new_oid, log->value.update.new_hash);
-+
-+ /*
-+ * When both the old object ID and the new object ID are null
-+ * then this is the reflog existence marker. The caller must
-+ * not be aware of it.
-+ */
-+ if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
-+ continue;
-+
-+ full_committer = fmt_ident(log->value.update.name, log->value.update.email,
-+ WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
-+ ret = fn(&old_oid, &new_oid, full_committer,
-+ log->value.update.time, log->value.update.tz_offset,
-+ log->value.update.message, cb_data);
++ ret = yield_log_record(&logs[i], fn, cb_data);
+ if (ret)
-+ break;
++ goto done;
+ }
+
++done:
+ reftable_iterator_destroy(&it);
+ for (i = 0; i < logs_nr; i++)
+ reftable_log_record_release(&logs[i]);
@@ refs/reftable-backend.c (new)
+ goto done;
+
+ ret = reftable_stack_reload(stack);
-+ if (ret)
++ if (ret < 0)
+ goto done;
+
+ ret = reftable_merged_table_seek_log(mt, &it, refname);
-+ if (ret)
++ if (ret < 0)
+ goto done;
+
+ /*
-+ * Seek the reflog to see whether it contains any reflog entries which
-+ * aren't marked for deletion.
++ * Check whether we get at least one log record for the given ref name.
++ * If so, the reflog exists, otherwise it doesn't.
+ */
-+ while (1) {
-+ ret = reftable_iterator_next_log(&it, &log);
-+ if (ret < 0)
-+ goto done;
-+ if (ret > 0 || strcmp(log.refname, refname)) {
-+ ret = 0;
-+ goto done;
-+ }
-+
-+ ret = 1;
-+ break;
++ ret = reftable_iterator_next_log(&it, &log);
++ if (ret < 0)
++ goto done;
++ if (ret > 0) {
++ ret = 0;
++ goto done;
+ }
+
++ ret = strcmp(log.refname, refname) == 0;
++
+done:
+ reftable_iterator_destroy(&it);
+ reftable_log_record_release(&log);
@@ refs/reftable-backend.c (new)
+ break;
+ }
+
-+ tombstone.refname = (char *)arg->refname,
-+ tombstone.value_type = REFTABLE_LOG_DELETION,
++ tombstone.refname = (char *)arg->refname;
++ tombstone.value_type = REFTABLE_LOG_DELETION;
+ tombstone.update_index = log.update_index;
+
+ ret = reftable_writer_add_log(writer, &tombstone);
2: bb694fa132 = 2: 63eafc9f5b ci: add jobs to test with the reftable backend
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-02-02 8:38 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-30 8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
2024-01-30 8:05 ` [PATCH 1/2] " Patrick Steinhardt
2024-02-01 15:17 ` Karthik Nayak
2024-02-02 8:30 ` Patrick Steinhardt
2024-02-02 10:52 ` Patrick Steinhardt
2024-01-30 8:05 ` [PATCH 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-01-30 22:08 ` [PATCH 0/2] refs: introduce " Junio C Hamano
2024-02-02 8:38 ` Patrick Steinhardt [this message]
2024-02-02 8:38 ` [PATCH v2 1/2] " Patrick Steinhardt
2024-02-02 8:38 ` [PATCH v2 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-02 13:02 ` [PATCH v2 0/2] refs: introduce " Karthik Nayak
2024-02-03 20:41 ` Junio C Hamano
2024-02-04 6:00 ` Patrick Steinhardt
2024-02-05 6:02 ` [PATCH v3 " Patrick Steinhardt
2024-02-05 6:02 ` [PATCH v3 1/2] " Patrick Steinhardt
2024-02-05 6:02 ` [PATCH v3 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-05 13:10 ` [PATCH v3 0/2] refs: introduce " Karthik Nayak
2024-02-07 7:20 ` [PATCH v4 " Patrick Steinhardt
2024-02-07 7:20 ` [PATCH v4 1/2] " Patrick Steinhardt
2024-02-07 22:31 ` Jeff King
2024-02-08 5:11 ` Patrick Steinhardt
2024-02-15 5:39 ` Jeff King
2024-02-07 7:20 ` [PATCH v4 2/2] ci: add jobs to test with the " Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1706862705.git.ps@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hanwen@google.com \
--cc=karthik.188@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).