git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Han-Wen Nienhuys <hanwen@google.com>,
	Karthik Nayak <karthik.188@gmail.com>
Subject: [PATCH v2 0/2] refs: introduce reftable backend
Date: Fri, 2 Feb 2024 09:38:25 +0100	[thread overview]
Message-ID: <cover.1706862705.git.ps@pks.im> (raw)
In-Reply-To: <cover.1706601199.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 18602 bytes --]

Hi,

this is the second version of my patch series that introduces the
reftable backend.

This version addresses the feedback by Karthik. I don't think it really
makes sense to spell out every change here -- the range diff should
likely be easier to digest.

Thanks!

Patrick

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2295 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3246 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

Range-diff against v1:
1:  5598cd1307 ! 1:  d6548dcfad refs: introduce reftable backend
    @@ Commit message
             previously been deleting reflogs together with their refs to avoid
             file/directory conflicts, which is not necessary anymore.
     
    +      - We can properly enumerate all refs. With the "files" backend it is
    +        not easily possible to distinguish between refs and non-refs because
    +        they may live side by side in the gitdir.
    +
         Not all of these improvements are realized with the current "reftable"
         backend implementation. At this point, the new backend is supposed to be
         a drop-in replacement for the "files" backend that is used by basically
    @@ Commit message
             workloads we would likely also want to perform pack loose refs,
             which would likely change the picture.
     
    -            Benchmark 1: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
                   Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
                   Range (min … max):     1.8 ms …   4.3 ms    133 runs
     
    -            Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
                   Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
                   Range (min … max):     2.4 ms …   2.9 ms    132 runs
     
    -            Benchmark 3: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
                   Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
                   Range (min … max):    1.969 s …  1.980 s    3 runs
     
    -            Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
                   Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
                   Range (min … max):    2.597 s …  2.622 s    3 runs
     
    -            Benchmark 5: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
                   Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
                   Range (min … max):   198.189 s … 198.670 s    3 runs
     
    -            Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
                   Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
                   Range (min … max):   290.223 s … 298.761 s    3 runs
     
    @@ refs/reftable-backend.c (new)
     +#include "../git-compat-util.h"
     +#include "../abspath.h"
     +#include "../chdir-notify.h"
    -+#include "../config.h"
     +#include "../environment.h"
     +#include "../gettext.h"
     +#include "../hash.h"
    @@ refs/reftable-backend.c (new)
     +#include "../reftable/reftable-stack.h"
     +#include "../reftable/reftable-record.h"
     +#include "../reftable/reftable-error.h"
    -+#include "../reftable/reftable-blocksource.h"
    -+#include "../reftable/reftable-reader.h"
     +#include "../reftable/reftable-iterator.h"
     +#include "../reftable/reftable-merged.h"
    -+#include "../reftable/reftable-generic.h"
     +#include "../setup.h"
     +#include "../strmap.h"
    -+#include "../worktree.h"
     +#include "refs-internal.h"
     +
     +/*
    @@ refs/reftable-backend.c (new)
     +struct reftable_ref_store {
     +	struct ref_store base;
     +
    ++	/*
    ++	 * The main stack refers to the common dir and thus contains common
    ++	 * refs as well as refs of the main repository.
    ++	 */
     +	struct reftable_stack *main_stack;
    ++	/*
    ++	 * The worktree stack refers to the gitdir in case the refdb is opened
    ++	 * via a worktree. It thus contains the per-worktree refs.
    ++	 */
     +	struct reftable_stack *worktree_stack;
    ++	/*
    ++	 * Map of worktree stacks by their respective worktree names. The map
    ++	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
    ++	 */
     +	struct strmap worktree_stacks;
     +	struct reftable_write_options write_options;
     +
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
     +	 * This stack contains both the shared and the main worktree refs.
    ++	 *
    ++	 * Note that we don't try to resolve the path in case we have a
    ++	 * worktree because `get_common_dir_noenv()` already does it for us.
     +	 */
     +	is_worktree = get_common_dir_noenv(&path, gitdir);
     +	if (!is_worktree) {
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * If we're in a worktree we also need to set up the worktree reftable
     +	 * stack that is contained in the per-worktree GIT_DIR.
    ++	 *
    ++	 * Ideally, we would also add the stack to our worktree stack map. But
    ++	 * we have no way to figure out the worktree name here and thus can't
    ++	 * do it efficiently.
     +	 */
     +	if (is_worktree) {
     +		strbuf_reset(&path);
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		 /*
    -+		  * Otherwise, if we either have no worktree refs anymore or if
    -+		  * the common ref sorts before the next worktree ref, we need
    -+		  * to figure out whether the next common ref belongs to the
    -+		  * main worktree. In that case, it should be ignored.
    ++		  * We now know that the lexicographically-next ref is a common
    ++		  * ref. When the common ref is a shared one we return it.
     +		  */
     +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
     +				       NULL) == REF_WORKTREE_SHARED)
     +			return ITER_SELECT_1;
     +
    ++		/*
    ++		 * Otherwise, if the common ref is a per-worktree ref we skip
    ++		 * it because it would belong to the main worktree, not ours.
    ++		 */
     +		return ITER_SKIP_1;
     +	} else {
     +		return ITER_DONE;
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		if (u->type & REF_ISSYMREF) {
    ++			/*
    ++			 * The reftable stack is locked at this point already,
    ++			 * so it is safe to call `refs_resolve_ref_unsafe()`
    ++			 * here without causing races.
    ++			 */
     +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
     +								       &current_oid, NULL);
     +
     +			if (u->flags & REF_NO_DEREF) {
    -+				/*
    -+				 * The reftable stack is locked at this point
    -+				 * already, so it should be safe to call
    -+				 * `refs_resolve_ref_unsafe()` here.
    -+				 */
     +				if (u->flags & REF_HAVE_OLD && !resolved) {
     +					strbuf_addf(err, _("cannot lock ref '%s': "
     +						    "error reading reference"), u->refname);
    @@ refs/reftable-backend.c (new)
     +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
     +				tombstone = &logs[logs_nr++];
     +				tombstone->refname = xstrdup(u->refname);
    -+				tombstone->value_type = REFTABLE_LOG_DELETION,
    ++				tombstone->value_type = REFTABLE_LOG_DELETION;
     +				tombstone->update_index = log.update_index;
     +			}
     +
    @@ refs/reftable-backend.c (new)
     +
     +	/*
     +	 * When deleting the old reference we have to use two update indices:
    -+	 * one to delete the old ref and its reflog, and once to create the new
    -+	 * ref and its reflog. They need to be staged with two separate indices
    -+	 * because the new reflog needs to encode both the deletion of the old
    -+	 * branch and the creation of the new branch, and we cannot do two
    -+	 * changes to a reflog in a single update.
    ++	 * once to delete the old ref and its reflog, and once to create the
    ++	 * new ref and its reflog. They need to be staged with two separate
    ++	 * indices because the new reflog needs to encode both the deletion of
    ++	 * the old branch and the creation of the new branch, and we cannot do
    ++	 * two changes to a reflog in a single update.
     +	 */
     +	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
     +	if (arg->delete_old)
    @@ refs/reftable-backend.c (new)
     +	struct reftable_reflog_iterator *iter =
     +		(struct reftable_reflog_iterator *)ref_iterator;
     +
    -+	if (iter->err) {
    -+		ref_iterator_abort(ref_iterator);
    -+		return iter->err;
    -+	}
    -+
    -+	while (1) {
    -+		int flags, ret;
    ++	while (!iter->err) {
    ++		int flags;
     +
    -+		ret = reftable_iterator_next_log(&iter->iter, &iter->log);
    -+		if (ret < 0) {
    -+			ref_iterator_abort(ref_iterator);
    -+			return ITER_ERROR;
    -+		}
    -+		if (ret > 0) {
    -+			if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    -+				return ITER_ERROR;
    -+			return ITER_DONE;
    -+		}
    ++		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
    ++		if (iter->err)
    ++			break;
     +
     +		/*
     +		 * We want the refnames that we have reflogs for, so we skip if
    @@ refs/reftable-backend.c (new)
     +		iter->base.refname = iter->log.refname;
     +		iter->base.oid = &iter->oid;
     +		iter->base.flags = flags;
    -+		return ITER_OK;
    ++
    ++		break;
     +	}
    ++
    ++	if (iter->err > 0) {
    ++		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    ++			return ITER_ERROR;
    ++		return ITER_DONE;
    ++	}
    ++
    ++	if (iter->err < 0) {
    ++		ref_iterator_abort(ref_iterator);
    ++		return ITER_ERROR;
    ++	}
    ++
    ++	return ITER_OK;
     +}
     +
     +static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/reftable-backend.c (new)
     +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
     +								  struct reftable_stack *stack)
     +{
    ++	struct reftable_merged_table *merged_table;
     +	struct reftable_reflog_iterator *iter;
    -+	struct reftable_merged_table *mt;
     +	int ret;
     +
     +	iter = xcalloc(1, sizeof(*iter));
    @@ refs/reftable-backend.c (new)
     +	iter->refs = refs;
     +	iter->base.oid = &iter->oid;
     +
    ++	ret = refs->err;
    ++	if (ret)
    ++		goto done;
    ++
     +	ret = reftable_stack_reload(refs->main_stack);
     +	if (ret < 0)
     +		goto done;
     +
    -+	mt = reftable_stack_merged_table(stack);
    -+	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
    ++	merged_table = reftable_stack_merged_table(stack);
    ++
    ++	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
     +	if (ret < 0)
     +		goto done;
     +
    @@ refs/reftable-backend.c (new)
     +					iterator_select, NULL);
     +}
     +
    ++static int yield_log_record(struct reftable_log_record *log,
    ++			    each_reflog_ent_fn fn,
    ++			    void *cb_data)
    ++{
    ++	struct object_id old_oid, new_oid;
    ++	const char *full_committer;
    ++
    ++	oidread(&old_oid, log->value.update.old_hash);
    ++	oidread(&new_oid, log->value.update.new_hash);
    ++
    ++	/*
    ++	 * When both the old object ID and the new object ID are null
    ++	 * then this is the reflog existence marker. The caller must
    ++	 * not be aware of it.
    ++	 */
    ++	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    ++		return 0;
    ++
    ++	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    ++				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    ++	return fn(&old_oid, &new_oid, full_committer,
    ++		  log->value.update.time, log->value.update.tz_offset,
    ++		  log->value.update.message, cb_data);
    ++}
    ++
     +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
     +						   const char *refname,
     +						   each_reflog_ent_fn fn,
    @@ refs/reftable-backend.c (new)
     +	mt = reftable_stack_merged_table(stack);
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
     +	while (!ret) {
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer;
    -+
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
     +			break;
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		oidread(&old_oid, log.value.update.old_hash);
    -+		oidread(&new_oid, log.value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log.value.update.name, log.value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log.value.update.time, log.value.update.tz_offset,
    -+			 log.value.update.message, cb_data);
    ++		ret = yield_log_record(&log, fn, cb_data);
     +		if (ret)
     +			break;
     +	}
    @@ refs/reftable-backend.c (new)
     +
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
    -+			break;
    ++			goto done;
     +		if (ret > 0 || strcmp(log.refname, refname)) {
     +			reftable_log_record_release(&log);
     +			ret = 0;
    @@ refs/reftable-backend.c (new)
     +	}
     +
     +	for (i = logs_nr; i--;) {
    -+		struct reftable_log_record *log = &logs[i];
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer = "";
    -+
    -+		oidread(&old_oid, log->value.update.old_hash);
    -+		oidread(&new_oid, log->value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log->value.update.time, log->value.update.tz_offset,
    -+			 log->value.update.message, cb_data);
    ++		ret = yield_log_record(&logs[i], fn, cb_data);
     +		if (ret)
    -+			break;
    ++			goto done;
     +	}
     +
    ++done:
     +	reftable_iterator_destroy(&it);
     +	for (i = 0; i < logs_nr; i++)
     +		reftable_log_record_release(&logs[i]);
    @@ refs/reftable-backend.c (new)
     +		goto done;
     +
     +	ret = reftable_stack_reload(stack);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	/*
    -+	 * Seek the reflog to see whether it contains any reflog entries which
    -+	 * aren't marked for deletion.
    ++	 * Check whether we get at least one log record for the given ref name.
    ++	 * If so, the reflog exists, otherwise it doesn't.
     +	 */
    -+	while (1) {
    -+		ret = reftable_iterator_next_log(&it, &log);
    -+		if (ret < 0)
    -+			goto done;
    -+		if (ret > 0 || strcmp(log.refname, refname)) {
    -+			ret = 0;
    -+			goto done;
    -+		}
    -+
    -+		ret = 1;
    -+		break;
    ++	ret = reftable_iterator_next_log(&it, &log);
    ++	if (ret < 0)
    ++		goto done;
    ++	if (ret > 0) {
    ++		ret = 0;
    ++		goto done;
     +	}
     +
    ++	ret = strcmp(log.refname, refname) == 0;
    ++
     +done:
     +	reftable_iterator_destroy(&it);
     +	reftable_log_record_release(&log);
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		tombstone.refname = (char *)arg->refname,
    -+		tombstone.value_type = REFTABLE_LOG_DELETION,
    ++		tombstone.refname = (char *)arg->refname;
    ++		tombstone.value_type = REFTABLE_LOG_DELETION;
     +		tombstone.update_index = log.update_index;
     +
     +		ret = reftable_writer_add_log(writer, &tombstone);
2:  bb694fa132 = 2:  63eafc9f5b ci: add jobs to test with the reftable backend
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-02-02  8:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
2024-02-01 15:17   ` Karthik Nayak
2024-02-02  8:30     ` Patrick Steinhardt
2024-02-02 10:52       ` Patrick Steinhardt
2024-01-30  8:05 ` [PATCH 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-01-30 22:08 ` [PATCH 0/2] refs: introduce " Junio C Hamano
2024-02-02  8:38 ` Patrick Steinhardt [this message]
2024-02-02  8:38   ` [PATCH v2 1/2] " Patrick Steinhardt
2024-02-02  8:38   ` [PATCH v2 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-02 13:02   ` [PATCH v2 0/2] refs: introduce " Karthik Nayak
2024-02-03 20:41   ` Junio C Hamano
2024-02-04  6:00     ` Patrick Steinhardt
2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
2024-02-05  6:02   ` [PATCH v3 1/2] " Patrick Steinhardt
2024-02-05  6:02   ` [PATCH v3 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-05 13:10   ` [PATCH v3 0/2] refs: introduce " Karthik Nayak
2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
2024-02-07  7:20   ` [PATCH v4 1/2] " Patrick Steinhardt
2024-02-07 22:31     ` Jeff King
2024-02-08  5:11       ` Patrick Steinhardt
2024-02-15  5:39         ` Jeff King
2024-02-07  7:20   ` [PATCH v4 2/2] ci: add jobs to test with the " Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1706862705.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanwen@google.com \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).