git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/14] refs: batch refname availability checks
@ 2025-02-17 15:50 Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 01/14] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
                   ` (19 more replies)
  0 siblings, 20 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Hi,

this patch series has been inspired by brian's report that the reftable
backend is significantly slower when writing many references compared to
the files backend. As explained in that thread, the underlying issue is
the design of tombstone references: when we first delete all references
in a repository and then recreate them, we still have all the tombstones
and thus we need to churn through all of them to figure out that they
have been deleted in the first place. The files backend does not have
this issue.

I consider the benchmark itself to be kind of broken, as it stems from
us deleting all refs and then recreating them. And if you pack refs in
between then the "reftable" backend outperforms the "files" backend.

But there are a couple of opportunities here anyway. While we cannot
make the underlying issue of tombstones being less efficient go away,
this has prompted me to have a deeper look at where we spend all the
time. There are three ideas in this series:

  - git-update-ref(1) performs ambiguity checks for any full-size object
    ID, which triggers a lot of reads. This is somewhat pointless though
    given that the manpage explicitly points out that the command is
    about object IDs, even though it does know to parse refs. But being
    part of plumbing, emitting the warning here does not make a ton of
    sense, and favoring object IDs over references in these cases is the
    obvious thing to do anyway.

  - For each ref "refs/heads/bar", we need to verify that neither
    "refs/heads" nor "refs" exists. This was repeated for every refname,
    but because most refnames use common prefixes this made us re-check
    a lot of prefixes. This is addressed by using a `strset` of already
    checked prefixes.

  - For each ref "refs/heads/bar", we need to verify that no ref
    "refs/heads/bar/*" exists. We always created a new ref iterator for
    this check, which requires us to discard all internal state and then
    recreate it. The reftable library has already been refactored though
    to have reseekable iterators, so we backfill this functionality to
    all the other iterators and then reuse the iterator.

With the (somewhat broken) benchmark we small speedup with the "files"
backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     233.8 ms ±   1.4 ms    [User: 81.2 ms, System: 151.5 ms]
      Range (min … max):   232.2 ms … 236.0 ms    10 runs

    Benchmark 2: update-ref (refformat = files, revision = HEAD)
      Time (mean ± σ):     192.3 ms ±   1.5 ms    [User: 67.2 ms, System: 124.0 ms]
      Range (min … max):   190.1 ms … 194.2 ms    10 runs

    Summary
      update-ref (refformat = files, revision = HEAD) ran
        1.22 ± 0.01 times faster than update-ref (refformat = files, revision = master)

And a huge speedup with the "reftable" backend:

    Benchmark 1: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
      Range (min … max):   16.785 s … 16.982 s    10 runs

    Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
      Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
      Range (min … max):    2.215 s …  2.244 s    10 runs

    Summary
      update-ref (refformat = reftable, revision = HEAD) ran
        7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)

We're still not up to speed with the "files" backend, but considerably
better. Given that this is an extreme edge case and not reflective of
the general case I'm okay with this result for now.

But more importantly, this refactoring also has a positive effect when
updating references in a repository with preexisting refs, which I
consider to be the more realistic scenario. The following benchmark
creates 10k refs with 100k preexisting refs.

With the "files" backend we see a modest improvement:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     470.1 ms ±   5.4 ms    [User: 104.5 ms, System: 363.1 ms]
      Range (min … max):   465.7 ms … 484.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     407.8 ms ±   5.4 ms    [User: 66.0 ms, System: 340.0 ms]
      Range (min … max):   399.9 ms … 417.6 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.15 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)

But with the "reftable" backend we see an almost 5x improvement, where
it's now ~15x faster than the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
      Range (min … max):   150.5 ms … 158.4 ms    18 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
      Range (min … max):    29.8 ms …  38.6 ms    71 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

The series is structured as follows:

  - Patches 1 to 4 implement the logic to skip ambiguity checks in
    git-update-ref(1).

  - Patch 5 and 6 introduce batched checks.

  - Patch 7 deduplicates the ref prefix checks.

  - Patch 8 to 14 implement the infrastructure to reseek iterators.

  - Patch 15 starts to reuse iterators for nested ref checks.

Thanks!

Patrick

[1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

---
Patrick Steinhardt (14):
      object-name: introduce `repo_get_oid_with_flags()`
      object-name: allow skipping ambiguity checks in `get_oid()` family
      builtin/update-ref: skip ambiguity checks when parsing object IDs
      refs: introduce function to batch refname availability checks
      refs/reftable: start using `refs_verify_refnames_available()`
      refs: stop re-verifying common prefixes for availability
      refs/iterator: separate lifecycle from iteration
      refs/iterator: provide infrastructure to re-seek iterators
      refs/iterator: implement seeking for merged iterators
      refs/iterator: implement seeking for reftable iterators
      refs/iterator: implement seeking for ref-cache iterators
      refs/iterator: implement seeking for `packed-ref` iterators
      refs/iterator: implement seeking for "files" iterators
      refs: reuse iterators when determining refname availability

 builtin/clone.c              |   2 +
 builtin/update-ref.c         |  12 ++-
 dir-iterator.c               |  24 +++---
 dir-iterator.h               |  13 +--
 hash.h                       |   1 +
 object-name.c                |  18 +++--
 object-name.h                |   6 ++
 refs.c                       | 186 ++++++++++++++++++++++++++-----------------
 refs.h                       |  12 +++
 refs/debug.c                 |  20 +++--
 refs/files-backend.c         |  52 ++++++------
 refs/iterator.c              | 145 +++++++++++++++++----------------
 refs/packed-backend.c        |  89 ++++++++++++---------
 refs/ref-cache.c             |  83 +++++++++++--------
 refs/refs-internal.h         |  54 ++++++++-----
 refs/reftable-backend.c      |  85 +++++++++++---------
 t/helper/test-dir-iterator.c |   1 +
 17 files changed, 471 insertions(+), 332 deletions(-)


---
base-commit: e2067b49ecaef9b7f51a17ce251f9207f72ef52d
change-id: 20250217-pks-update-ref-optimization-15c795e66e2b



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH 01/14] object-name: introduce `repo_get_oid_with_flags()`
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 02/14] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.

This function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 14 ++++++++------
 object-name.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 945d5bdef25..bc0265ad2a1 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
 	strbuf_release(&ctx->symlink_path);
 }
 
-/*
- * This is like "get_oid_basic()", except it allows "object ID expressions",
- * notably "xyz^" for "parent of xyz"
- */
-int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+int repo_get_oid_with_flags(struct repository *r, const char *name, struct object_id *oid,
+			    unsigned flags)
 {
 	struct object_context unused;
-	int ret = get_oid_with_context(r, name, 0, oid, &unused);
+	int ret = get_oid_with_context(r, name, flags, oid, &unused);
 	object_context_release(&unused);
 	return ret;
 }
 
+int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+{
+	return repo_get_oid_with_flags(r, name, oid, 0);
+}
+
 /*
  * This returns a non-zero value if the string (built using printf
  * format and the given arguments) is not a valid object.
diff --git a/object-name.h b/object-name.h
index 8dba4a47a47..fb5a97b2c8e 100644
--- a/object-name.h
+++ b/object-name.h
@@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
 void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 			      int abbrev_len);
 
+/*
+ * This is like "get_oid_basic()", except it allows "object ID expressions",
+ * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
+ */
+int repo_get_oid_with_flags(struct repository *r, const char *str, struct object_id *oid,
+			    unsigned flags);
 int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
 __attribute__((format (printf, 2, 3)))
 int get_oidf(struct object_id *oid, const char *fmt, ...);

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 02/14] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 01/14] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.

Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 hash.h        | 1 +
 object-name.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hash.h b/hash.h
index 4367acfec50..aeb705d6850 100644
--- a/hash.h
+++ b/hash.h
@@ -204,6 +204,7 @@ struct object_id {
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
 #define GET_OID_HASH_ANY      020000
+#define GET_OID_HASH_SKIP_AMBIGUITY_CHECK 040000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index bc0265ad2a1..0b23a65d2ed 100644
--- a/object-name.c
+++ b/object-name.c
@@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	int fatal = !(flags & GET_OID_QUIETLY);
 
 	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
-		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
+		if (!(flags & GET_OID_HASH_SKIP_AMBIGUITY_CHECK) &&
+		    repo_settings_get_warn_ambiguous_refs(r) &&
+		    warn_on_object_refname_ambiguity) {
 			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 			if (refs_found > 0) {
 				warning(warn_msg, len, str);

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 01/14] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 02/14] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-18 16:04   ` Karthik Nayak
  2025-02-17 15:50 ` [PATCH 04/14] refs: introduce function to batch refname availability checks Patrick Steinhardt
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve references, despite the fact that its manpage does
not mention this fact even once.

One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is out of two reasons:

  - The manpage is explicitly structured around object IDs. So if we see
    a fully blown object ID, the intent should be quite clear in
    general.

  - The command is part of our plumbing layer and not a tool that users
    would generally use in interactive workflows. As such, the warning
    will likely not be visible to anybody in the first place.

Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.

Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The following benchmark creates 10000 new
references with a 100000 preexisting refs with the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
      Range (min … max):   461.9 ms … 479.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
      Range (min … max):   384.9 ms … 405.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

And with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
      Range (min … max):   142.7 ms … 150.8 ms    19 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
      Range (min … max):    61.1 ms …  66.6 ms    41 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/update-ref.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4d35bdc4b4b..ac330748244 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
 		(*next)++;
 		*next = parse_arg(*next, &arg);
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else {
 			/* Without -z, an empty value means all zeros: */
@@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
 		*next += arg.len;
 
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
 			/* With -z, treat an empty value as all zeros: */
@@ -772,7 +774,8 @@ int cmd_update_ref(int argc,
 		refname = argv[0];
 		value = argv[1];
 		oldval = argv[2];
-		if (repo_get_oid(the_repository, value, &oid))
+		if (repo_get_oid_with_flags(the_repository, value, &oid,
+					    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid SHA1", value);
 	}
 
@@ -783,7 +786,8 @@ int cmd_update_ref(int argc,
 			 * must not already exist:
 			 */
 			oidclr(&oldoid, the_repository->hash_algo);
-		else if (repo_get_oid(the_repository, oldval, &oldoid))
+		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
+						 GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid old SHA1", oldval);
 	}
 

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 04/14] refs: introduce function to batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 05/14] refs/reftable: start using `refs_verify_refnames_available()` Patrick Steinhardt
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.

Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.

The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 169 +++++++++++++++++++++++++++++++++++++----------------------------
 refs.h |  12 +++++
 2 files changed, 109 insertions(+), 72 deletions(-)

diff --git a/refs.c b/refs.c
index f4094a326a9..5a9b0f2fa1e 100644
--- a/refs.c
+++ b/refs.c
@@ -2467,19 +2467,15 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 	return ret;
 }
 
-int refs_verify_refname_available(struct ref_store *refs,
-				  const char *refname,
-				  const struct string_list *extras,
-				  const struct string_list *skip,
-				  unsigned int initial_transaction,
-				  struct strbuf *err)
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err)
 {
-	const char *slash;
-	const char *extra_refname;
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct object_id oid;
-	unsigned int type;
 	int ret = -1;
 
 	/*
@@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
 
 	assert(err);
 
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/*
-		 * Just saying "Is a directory" when we e.g. can't
-		 * lock some multi-level ref isn't very informative,
-		 * the user won't be told *what* is a directory, so
-		 * let's not use strerror() below.
-		 */
-		int ignore_errno;
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+	for (size_t i = 0; i < refnames->nr; i++) {
+		const char *refname = refnames->items[i].string;
+		const char *extra_refname;
+		struct object_id oid;
+		unsigned int type;
+		const char *slash;
+
+		strbuf_reset(&dirname);
+
+		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+			/*
+			 * Just saying "Is a directory" when we e.g. can't
+			 * lock some multi-level ref isn't very informative,
+			 * the user won't be told *what* is a directory, so
+			 * let's not use strerror() below.
+			 */
+			int ignore_errno;
+
+			/* Expand dirname to the new prefix, not including the trailing slash: */
+			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+			/*
+			 * We are still at a leading dir of the refname (e.g.,
+			 * "refs/foo"; if there is a reference with that name,
+			 * it is a conflict, *unless* it is in skip.
+			 */
+			if (skip && string_list_has_string(skip, dirname.buf))
+				continue;
+
+			if (!initial_transaction &&
+			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+					       &type, &ignore_errno)) {
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    dirname.buf, refname);
+				goto cleanup;
+			}
+
+			if (extras && string_list_has_string(extras, dirname.buf)) {
+				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
+					    refname, dirname.buf);
+				goto cleanup;
+			}
+		}
 
 		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
+		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+		 * There is no point in searching for a reference with that
+		 * name, because a refname isn't considered to conflict with
+		 * itself. But we still need to check for references whose
+		 * names are in the "refs/foo/bar/" namespace, because they
+		 * *do* conflict.
 		 */
-		if (skip && string_list_has_string(skip, dirname.buf))
-			continue;
+		strbuf_addstr(&dirname, refname + dirname.len);
+		strbuf_addch(&dirname, '/');
+
+		if (!initial_transaction) {
+			struct ref_iterator *iter;
+			int ok;
+
+			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+						       DO_FOR_EACH_INCLUDE_BROKEN);
+			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+				if (skip &&
+				    string_list_has_string(skip, iter->refname))
+					continue;
+
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    iter->refname, refname);
+				ref_iterator_abort(iter);
+				goto cleanup;
+			}
 
-		if (!initial_transaction &&
-		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
-				       &type, &ignore_errno)) {
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    dirname.buf, refname);
-			goto cleanup;
+			if (ok != ITER_DONE)
+				BUG("error while iterating over references");
 		}
 
-		if (extras && string_list_has_string(extras, dirname.buf)) {
+		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+		if (extra_refname) {
 			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-				    refname, dirname.buf);
+				    refname, extra_refname);
 			goto cleanup;
 		}
 	}
 
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (!initial_transaction) {
-		struct ref_iterator *iter;
-		int ok;
-
-		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-					       DO_FOR_EACH_INCLUDE_BROKEN);
-		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-			if (skip &&
-			    string_list_has_string(skip, iter->refname))
-				continue;
-
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    iter->refname, refname);
-			ref_iterator_abort(iter);
-			goto cleanup;
-		}
-
-		if (ok != ITER_DONE)
-			BUG("error while iterating over references");
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-			    refname, extra_refname);
-	else
-		ret = 0;
+	ret = 0;
 
 cleanup:
 	strbuf_release(&referent);
@@ -2569,6 +2577,23 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
+int refs_verify_refname_available(struct ref_store *refs,
+				  const char *refname,
+				  const struct string_list *extras,
+				  const struct string_list *skip,
+				  unsigned int initial_transaction,
+				  struct strbuf *err)
+{
+	struct string_list_item item = { .string = (char *) refname };
+	struct string_list refnames = {
+		.items = &item,
+		.nr = 1,
+	};
+
+	return refs_verify_refnames_available(refs, &refnames, extras, skip,
+					      initial_transaction, err);
+}
+
 struct do_for_each_reflog_help {
 	each_reflog_fn *fn;
 	void *cb_data;
diff --git a/refs.h b/refs.h
index a0cdd99250e..185aed5a461 100644
--- a/refs.h
+++ b/refs.h
@@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  unsigned int initial_transaction,
 				  struct strbuf *err);
 
+/*
+ * Same as `refs_verify_refname_available()`, but checking for a list of
+ * refnames instead of only a single item. This is more efficient in the case
+ * where one needs to check multiple refnames.
+ */
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err);
+
 int refs_ref_exists(struct ref_store *refs, const char *refname);
 
 int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 05/14] refs/reftable: start using `refs_verify_refnames_available()`
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 04/14] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 06/14] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as we
essentially still call `refs_verify_refname_available()` in a loop, but
this will change in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index d39a14c5a46..2a90e7cb391 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
 	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct reftable_transaction_data *tx_data = NULL;
 	struct reftable_backend *be;
 	struct object_id head_oid;
@@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * can output a proper error message instead of failing
 			 * at a later point.
 			 */
-			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL,
-							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
-							    err);
-			if (ret < 0)
-				goto done;
+			string_list_append(&refnames_to_check, u->refname);
 
 			/*
 			 * There is no need to write the reference deletion
@@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	string_list_sort(&refnames_to_check);
+	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
+					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+					     err);
+	if (ret < 0)
+		goto done;
+
 	transaction->backend_data = tx_data;
 	transaction->state = REF_TRANSACTION_PREPARED;
 
@@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 	string_list_clear(&affected_refnames, 0);
 	strbuf_release(&referent);
 	strbuf_release(&head_referent);
+	string_list_clear(&refnames_to_check, 0);
 
 	return ret;
 }

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 06/14] refs: stop re-verifying common prefixes for availability
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 05/14] refs/reftable: start using `refs_verify_refnames_available()` Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-18 16:12   ` Karthik Nayak
  2025-02-17 15:50 ` [PATCH 07/14] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.

When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.

Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup when creating many references that all share a
common prefix:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
      Range (min … max):    60.6 ms …  69.5 ms    38 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
      Range (min … max):    38.1 ms …  47.3 ms    61 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the same speedup cannot be observed for the "files" backend
because it still performs availability check per reference.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/refs.c b/refs.c
index 5a9b0f2fa1e..eaf41421f50 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct strset dirnames;
 	int ret = -1;
 
 	/*
@@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 	assert(err);
 
+	strset_init(&dirnames);
+
 	for (size_t i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 		const char *extra_refname;
@@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 			if (skip && string_list_has_string(skip, dirname.buf))
 				continue;
 
+			/*
+			 * If we've already seen the directory we don't need to
+			 * process it again. Skip it to avoid checking checking
+			 * common prefixes like "refs/heads/" repeatedly.
+			 */
+			if (!strset_add(&dirnames, dirname.buf))
+				continue;
+
 			if (!initial_transaction &&
 			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 					       &type, &ignore_errno)) {
@@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 cleanup:
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
+	strset_clear(&dirnames);
 	return ret;
 }
 

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 06/14] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-18 16:52   ` shejialuo
  2025-02-18 17:13   ` Karthik Nayak
  2025-02-17 15:50 ` [PATCH 08/14] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
                   ` (12 subsequent siblings)
  19 siblings, 2 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |  2 +
 dir-iterator.c               | 24 +++++------
 dir-iterator.h               | 13 ++----
 refs.c                       |  7 +++-
 refs/debug.c                 |  9 ++---
 refs/files-backend.c         | 36 +++++------------
 refs/iterator.c              | 95 ++++++++++++++------------------------------
 refs/packed-backend.c        | 27 ++++++-------
 refs/ref-cache.c             |  9 ++---
 refs/refs-internal.h         | 31 +++++----------
 refs/reftable-backend.c      | 34 ++++------------
 t/helper/test-dir-iterator.c |  1 +
 12 files changed, 99 insertions(+), 189 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..01f51f6bac1 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -27,10 +27,8 @@
  *             goto error_handler;
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *             if (want_to_stop_iteration())
  *                     break;
- *             }
  *
  *             // Access information about the current path:
  *             if (S_ISDIR(iter->st.st_mode))
@@ -39,6 +37,7 @@
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_release(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +106,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/refs.c b/refs.c
index eaf41421f50..8eff60a2186 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..9511b6f3448 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1382,7 +1372,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1390,6 +1380,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1456,6 +1447,7 @@ static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2303,9 +2295,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2315,23 +2304,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3808,6 +3791,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..aaeff270437 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +365,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +377,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +424,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..27ff822cf43 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -292,10 +292,8 @@ enum do_for_each_ref_flags {
  *     struct ref_iterator *iter = ...;
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
- *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *             if (want_to_stop_iteration())
  *                     break;
- *             }
  *
  *             // Access information about the current reference:
  *             if (!(iter->flags & REF_ISSYMREF))
@@ -307,6 +305,7 @@ enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +332,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +433,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +451,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2a90e7cb391..06543f79c64 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2017,17 +2008,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2038,21 +2022,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 08/14] refs/iterator: provide infrastructure to re-seek iterators
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 07/14] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 09/14] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.

Introduce a new `.seek` function in the ref iterator vtable that allows
callers to re-seek an iterator. It is expected to be functionally the
same as calling `refs_ref_iterator_begin()` with a different (or the
same) prefix.

Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/debug.c         | 11 +++++++++++
 refs/iterator.c      | 24 ++++++++++++++++++++++++
 refs/refs-internal.h | 23 +++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/refs/debug.c b/refs/debug.c
index a9786da4ba1..5390fa9c187 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -169,6 +169,16 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return res;
 }
 
+static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->seek(diter->iter, prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	return res;
+}
+
 static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -189,6 +199,7 @@ static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
+	.seek = debug_ref_iterator_seek,
 	.peel = debug_ref_iterator_peel,
 	.release = debug_ref_iterator_release,
 };
diff --git a/refs/iterator.c b/refs/iterator.c
index aaeff270437..757b105261a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,6 +15,12 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix)
+{
+	return ref_iterator->vtable->seek(ref_iterator, prefix);
+}
+
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled)
 {
@@ -50,6 +56,12 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 	return ITER_DONE;
 }
 
+static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				   const char *prefix UNUSED)
+{
+	return 0;
+}
+
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				   struct object_id *peeled UNUSED)
 {
@@ -62,6 +74,7 @@ static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
+	.seek = empty_ref_iterator_seek,
 	.peel = empty_ref_iterator_peel,
 	.release = empty_ref_iterator_release,
 };
@@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				    struct object_id *peeled)
 {
@@ -387,6 +410,7 @@ static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
+	.seek = prefix_ref_iterator_seek,
 	.peel = prefix_ref_iterator_peel,
 	.release = prefix_ref_iterator_release,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 27ff822cf43..5fade7e8408 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -325,6 +325,21 @@ struct ref_iterator {
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
 /*
  * If possible, peel the reference currently being viewed by the
  * iterator. Return 0 on success.
@@ -443,6 +458,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference matching the given prefix. Should
+ * behave the same as if a new iterator was created with the same prefix.
+ */
+typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
+				 const char *prefix);
+
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
  */
@@ -457,6 +479,7 @@ typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
+	ref_iterator_seek_fn *seek;
 	ref_iterator_peel_fn *peel;
 	ref_iterator_release_fn *release;
 };

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 09/14] refs/iterator: implement seeking for merged iterators
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 08/14] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-19 20:10   ` Karthik Nayak
  2025-02-17 15:50 ` [PATCH 10/14] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index 757b105261a..63608ef9907 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
 struct merge_ref_iterator {
 	struct ref_iterator base;
 
-	struct ref_iterator *iter0, *iter1;
+	struct ref_iterator *iter0, *iter0_owned;
+	struct ref_iterator *iter1, *iter1_owned;
 
 	ref_iterator_select_fn *select;
 	void *cb_data;
@@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
-			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
-			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -177,7 +176,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
-			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -206,7 +204,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
-				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -226,6 +223,28 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_ERROR;
 }
 
+static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ret;
+
+	iter->current = NULL;
+	iter->iter0 = iter->iter0_owned;
+	iter->iter1 = iter->iter1_owned;
+
+	ret = ref_iterator_seek(iter->iter0, prefix);
+	if (ret < 0)
+		return ret;
+
+	ret = ref_iterator_seek(iter->iter1, prefix);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	ref_iterator_free(iter->iter0);
-	ref_iterator_free(iter->iter1);
+	ref_iterator_free(iter->iter0_owned);
+	ref_iterator_free(iter->iter1_owned);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
+	.seek = merge_ref_iterator_seek,
 	.peel = merge_ref_iterator_peel,
 	.release = merge_ref_iterator_release,
 };
@@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 */
 
 	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
-	iter->iter0 = iter0;
-	iter->iter1 = iter1;
+	iter->iter0 = iter->iter0_owned = iter0;
+	iter->iter1 = iter->iter1_owned = iter1;
 	iter->select = select;
 	iter->cb_data = cb_data;
 	iter->current = NULL;

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 10/14] refs/iterator: implement seeking for reftable iterators
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 09/14] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-19 20:13   ` Karthik Nayak
  2025-02-17 15:50 ` [PATCH 11/14] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:

  - We do not support seeking on reflog iterators.

  - We start to check whether `reftable_stack_init_ref_iterator()` is
    successful.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 06543f79c64..b0c09f34433 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -547,7 +547,7 @@ struct reftable_ref_iterator {
 	struct reftable_ref_record ref;
 	struct object_id oid;
 
-	const char *prefix;
+	char *prefix;
 	size_t prefix_len;
 	char **exclude_patterns;
 	size_t exclude_patterns_index;
@@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				      const char *prefix)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->prefix_len = prefix ? strlen(prefix) : 0;
+	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+
+	return iter->err;
+}
+
 static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				      struct object_id *peeled)
 {
@@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
+	free(iter->prefix);
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
+	.seek = reftable_ref_iterator_seek,
 	.peel = reftable_ref_iterator_peel,
 	.release = reftable_ref_iterator_release,
 };
@@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
-	iter->prefix = prefix;
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
 	iter->refs = refs;
@@ -821,8 +835,11 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	reftable_stack_init_ref_iterator(stack, &iter->iter);
-	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
+	ret = reftable_stack_init_ref_iterator(stack, &iter->iter);
+	if (ret)
+		goto done;
+
+	ret = reftable_ref_iterator_seek(&iter->base, prefix);
 	if (ret)
 		goto done;
 
@@ -2015,6 +2032,13 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+					 const char *prefix UNUSED)
+{
+	BUG("reftable reflog iterator cannot be seeked");
+	return -1;
+}
+
 static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 					 struct object_id *peeled UNUSED)
 {
@@ -2033,6 +2057,7 @@ static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
+	.seek = reftable_reflog_iterator_seek,
 	.peel = reftable_reflog_iterator_peel,
 	.release = reftable_reflog_iterator_release,
 };

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 11/14] refs/iterator: implement seeking for ref-cache iterators
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (9 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 10/14] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 12/14] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.

Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance corncern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/ref-cache.c | 74 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 26 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 6457e02c1ea..b54547d71ee 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -362,9 +362,7 @@ struct cache_ref_iterator {
 	struct ref_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack.
 	 */
 	size_t levels_nr;
 
@@ -389,6 +387,9 @@ struct cache_ref_iterator {
 	struct cache_ref_iterator_level *levels;
 
 	struct repository *repo;
+	struct ref_cache *cache;
+
+	int prime_dir;
 };
 
 static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
@@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
 
+	if (!iter->levels_nr)
+		return ITER_DONE;
+
 	while (1) {
 		struct cache_ref_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -444,6 +448,40 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct ref_dir *dir;
+
+	dir = get_ref_dir(iter->cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix);
+
+	if (dir) {
+		struct cache_ref_iterator_level *level;
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, prefix);
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		if (prefix && *prefix) {
+			iter->prefix = xstrdup(prefix);
+			level->prefix_state = PREFIX_WITHIN_DIR;
+		} else {
+			level->prefix_state = PREFIX_CONTAINS_DIR;
+		}
+	} else {
+		iter->levels_nr = 0;
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -462,6 +500,7 @@ static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
+	.seek = cache_ref_iterator_seek,
 	.peel = cache_ref_iterator_peel,
 	.release = cache_ref_iterator_release,
 };
@@ -471,39 +510,22 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      struct repository *repo,
 					      int prime_dir)
 {
-	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	dir = get_ref_dir(cache->root);
-	if (prefix && *prefix)
-		dir = find_containing_dir(dir, prefix);
-	if (!dir)
-		/* There's nothing to iterate over. */
-		return empty_ref_iterator_begin();
-
-	if (prime_dir)
-		prime_ref_dir(dir, prefix);
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
+	iter->repo = repo;
+	iter->cache = cache;
+	iter->prime_dir = prime_dir;
 
-	if (prefix && *prefix) {
-		iter->prefix = xstrdup(prefix);
-		level->prefix_state = PREFIX_WITHIN_DIR;
-	} else {
-		level->prefix_state = PREFIX_CONTAINS_DIR;
+	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
 	}
 
-	iter->repo = repo;
-
 	return ref_iterator;
 }

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 12/14] refs/iterator: implement seeking for `packed-ref` iterators
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (10 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 11/14] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 13/14] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 62 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 38a1956d1a8..71a38acfedc 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -819,6 +819,8 @@ struct packed_ref_iterator {
 
 	struct snapshot *snapshot;
 
+	char *prefix;
+
 	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
@@ -841,11 +843,9 @@ struct packed_ref_iterator {
 };
 
 /*
- * Move the iterator to the next record in the snapshot, without
- * respect for whether the record is actually required by the current
- * iteration. Adjust the fields in `iter` and return `ITER_OK` or
- * `ITER_DONE`. This function does not free the iterator in the case
- * of `ITER_DONE`.
+ * Move the iterator to the next record in the snapshot. Adjust the fields in
+ * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
+ * iterator in the case of `ITER_DONE`.
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
@@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = next_record(iter)) == ITER_OK) {
+		const char *refname = iter->base.refname;
+		const char *prefix = iter->prefix;
+
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
 		    !is_per_worktree_ref(iter->base.refname))
 			continue;
@@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 					    &iter->oid, iter->flags))
 			continue;
 
+		while (prefix && *prefix) {
+			if (*refname < *prefix)
+				BUG("packed-refs backend yielded reference preceding its prefix");
+			else if (*refname > *prefix)
+				return ITER_DONE;
+			prefix++;
+			refname++;
+		}
+
 		return ITER_OK;
 	}
 
 	return ok;
 }
 
+static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct packed_ref_iterator *iter =
+		(struct packed_ref_iterator *)ref_iterator;
+	const char *start;
+
+	if (prefix && *prefix)
+		start = find_reference_location(iter->snapshot, prefix, 0);
+	else
+		start = iter->snapshot->start;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->pos = start;
+	iter->eof = iter->snapshot->eof;
+
+	return 0;
+}
+
 static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
+	free(iter->prefix);
 	release_snapshot(iter->snapshot);
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
+	.seek = packed_ref_iterator_seek,
 	.peel = packed_ref_iterator_peel,
 	.release = packed_ref_iterator_release,
 };
@@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
-	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
-	if (start == snapshot->eof)
-		return empty_ref_iterator_begin();
-
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
@@ -1130,19 +1155,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
-
-	iter->pos = start;
-	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-
 	iter->base.oid = &iter->oid;
-
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (prefix && *prefix)
-		/* Stop iteration after we've gone *past* prefix: */
-		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+	packed_ref_iterator_seek(&iter->base, prefix);
 
 	return ref_iterator;
 }

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 13/14] refs/iterator: implement seeking for "files" iterators
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (11 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 12/14] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-17 15:50 ` [PATCH 14/14] refs: reuse iterators when determining refname availability Patrick Steinhardt
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9511b6f3448..acc28e1ad81 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -922,6 +922,14 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -940,6 +948,7 @@ static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
+	.seek = files_ref_iterator_seek,
 	.peel = files_ref_iterator_peel,
 	.release = files_ref_iterator_release,
 };
@@ -2298,6 +2307,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				      const char *prefix UNUSED)
+{
+	BUG("ref_iterator_seek() called for reflog_iterator");
+}
+
 static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				      struct object_id *peeled UNUSED)
 {
@@ -2313,6 +2328,7 @@ static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
+	.seek = files_reflog_iterator_seek,
 	.peel = files_reflog_iterator_peel,
 	.release = files_reflog_iterator_release,
 };

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH 14/14] refs: reuse iterators when determining refname availability
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (12 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 13/14] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
@ 2025-02-17 15:50 ` Patrick Steinhardt
  2025-02-18 17:10 ` [PATCH 00/14] refs: batch refname availability checks brian m. carlson
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-17 15:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a speedup with the
reftable backend, which is the only backend that knows to batch refname
availability checks:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 8eff60a2186..6cbb9decdb0 100644
--- a/refs.c
+++ b/refs.c
@@ -2555,8 +2555,13 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		if (!initial_transaction) {
 			int ok;
 
-			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-						       DO_FOR_EACH_INCLUDE_BROKEN);
+			if (!iter) {
+				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+							       DO_FOR_EACH_INCLUDE_BROKEN);
+			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+				goto cleanup;
+			}
+
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 				    string_list_has_string(skip, iter->refname))
@@ -2569,9 +2574,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
-
-			ref_iterator_free(iter);
-			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);

-- 
2.48.1.666.gff9fcf71b7.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-17 15:50 ` [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-18 16:04   ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-02-18 16:04 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

> diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> index 4d35bdc4b4b..ac330748244 100644
> --- a/builtin/update-ref.c
> +++ b/builtin/update-ref.c
> @@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
>  		(*next)++;
>  		*next = parse_arg(*next, &arg);
>  		if (arg.len) {
> -			if (repo_get_oid(the_repository, arg.buf, oid))
> +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
> +						    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
>  				goto invalid;
>  		} else {
>  			/* Without -z, an empty value means all zeros: */
> @@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
>  		*next += arg.len;
>
>  		if (arg.len) {
> -			if (repo_get_oid(the_repository, arg.buf, oid))
> +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
> +						    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
>  				goto invalid;
>  		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
>  			/* With -z, treat an empty value as all zeros: */

So the above two instances are used within the individual sub-commands
for `--stdin` mode. The symref commands use `parse_refname()` for
parsing refnames, so all good.

> @@ -772,7 +774,8 @@ int cmd_update_ref(int argc,
>  		refname = argv[0];
>  		value = argv[1];
>  		oldval = argv[2];
> -		if (repo_get_oid(the_repository, value, &oid))
> +		if (repo_get_oid_with_flags(the_repository, value, &oid,
> +					    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
>  			die("%s: not a valid SHA1", value);
>  	}
>
> @@ -783,7 +786,8 @@ int cmd_update_ref(int argc,
>  			 * must not already exist:
>  			 */
>  			oidclr(&oldoid, the_repository->hash_algo);
> -		else if (repo_get_oid(the_repository, oldval, &oldoid))
> +		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
> +						 GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
>  			die("%s: not a valid old SHA1", oldval);
>  	}
>

This is when the user uses 'git update-ref' directly. Makes sense.

>
> --
> 2.48.1.666.gff9fcf71b7.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 06/14] refs: stop re-verifying common prefixes for availability
  2025-02-17 15:50 ` [PATCH 06/14] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-02-18 16:12   ` Karthik Nayak
  2025-02-19 11:52     ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: Karthik Nayak @ 2025-02-18 16:12 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3811 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> One of the checks done by `refs_verify_refnames_available()` is whether
> any of the prefixes of a reference already exists. For example, given a
> reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
> already exist, and if so we'd abort the transaction.
>
> When updating multiple references at once, this check is performed for
> each of the references individually. Consequently, because references
> tend to have common prefixes like "refs/heads/" or refs/tags/", we
> evaluate the availability of these prefixes repeatedly. Naturally this
> is a waste of compute, as the availability of those prefixes should in
> general not change in the middle of a transaction. And if it would,
> backends would notice at a later point in time.
>
> Optimize this pattern by storing prefixes in a `strset` so that we can
> trivially track those prefixes that we have already checked. This leads
> to a significant speedup when creating many references that all share a
> common prefix:
>
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
>       Range (min … max):    60.6 ms …  69.5 ms    38 runs
>
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
>       Range (min … max):    38.1 ms …  47.3 ms    61 runs
>
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>
> Note that the same speedup cannot be observed for the "files" backend
> because it still performs availability check per reference.
>

In the previous commit, you started using the new function in the
reftable backend, can we not make a similar change to the files backend?

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/refs.c b/refs.c
> index 5a9b0f2fa1e..eaf41421f50 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  {
>  	struct strbuf dirname = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
> +	struct strset dirnames;
>  	int ret = -1;
>
>  	/*
> @@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
>
>  	assert(err);
>
> +	strset_init(&dirnames);
> +
>  	for (size_t i = 0; i < refnames->nr; i++) {
>  		const char *refname = refnames->items[i].string;
>  		const char *extra_refname;
> @@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  			if (skip && string_list_has_string(skip, dirname.buf))
>  				continue;
>
> +			/*
> +			 * If we've already seen the directory we don't need to
> +			 * process it again. Skip it to avoid checking checking
> +			 * common prefixes like "refs/heads/" repeatedly.
> +			 */
> +			if (!strset_add(&dirnames, dirname.buf))
> +				continue;
> +

This was simple and neat. Nice.

>  			if (!initial_transaction &&
>  			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
>  					       &type, &ignore_errno)) {
> @@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  cleanup:
>  	strbuf_release(&referent);
>  	strbuf_release(&dirname);
> +	strset_clear(&dirnames);
>  	return ret;
>  }
>
>
> --
> 2.48.1.666.gff9fcf71b7.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-17 15:50 ` [PATCH 07/14] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-02-18 16:52   ` shejialuo
  2025-02-19 11:52     ` Patrick Steinhardt
  2025-02-18 17:13   ` Karthik Nayak
  1 sibling, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-18 16:52 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 17, 2025 at 04:50:21PM +0100, Patrick Steinhardt wrote:
> The ref and reflog iterators have their lifecycle attached to iteration:
> once the iterator reaches its end, it is automatically released and the
> caller doesn't have to care about that anymore. When the iterator should
> be released before it has been exhausted, callers must explicitly abort
> the iterator via `ref_iterator_abort()`.
> 
> This lifecycle is somewhat unusual in the Git codebase and creates two
> problems:
> 
>   - Callsites need to be very careful about when exactly they call
>     `ref_iterator_abort()`, as calling the function is only valid when
>     the iterator itself still is. This leads to somewhat awkward calling
>     patterns in some situations.
> 

In what situations, the iterator has been disappeared when we call
`ref_iterator_abort`? Why is this awkward? When reading, I am really
curious about this.

>   - It is impossible to reuse iterators and re-seek them to a different
>     prefix. This feature isn't supported by any iterator implementation
>     except for the reftable iterators anyway, but if it was implemented
>     it would allow us to optimize cases where we need to search for
>     specific references repeatedly by reusing internal state.
> 

So, the reason why we cannot reuse the iterator is that we will
deallocate the iterator? So, below we want to detangle the lifecycle. I
don't know whether my understanding is correct.

> Detangle the lifecycle from iteration so that we don't deallocate the
> iterator anymore once it is exhausted. Instead, callers are now expected
> to always call a newly introduce `ref_iterator_free()` function that
> deallocates the iterator and its internal state.
> 

A design question: why do not just introduce a variable, for example,
`unsigned int free` to indicate whether we need to free the iterator?

> While at it, drop the return value of `ref_iterator_abort()`, which
> wasn't really required by any of the iterator implementations anyway.
> Furthermore, stop calling `base_ref_iterator_free()` in any of the
> backends, but instead call it in `ref_iterator_free()`.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/clone.c              |  2 +
>  dir-iterator.c               | 24 +++++------
>  dir-iterator.h               | 13 ++----
>  refs.c                       |  7 +++-
>  refs/debug.c                 |  9 ++---
>  refs/files-backend.c         | 36 +++++------------
>  refs/iterator.c              | 95 ++++++++++++++------------------------------
>  refs/packed-backend.c        | 27 ++++++-------
>  refs/ref-cache.c             |  9 ++---
>  refs/refs-internal.h         | 31 +++++----------
>  refs/reftable-backend.c      | 34 ++++------------
>  t/helper/test-dir-iterator.c |  1 +
>  12 files changed, 99 insertions(+), 189 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index fd001d800c6..ac3e84b2b18 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		strbuf_setlen(src, src_len);
>  		die(_("failed to iterate over '%s'"), src->buf);
>  	}
> +
> +	dir_iterator_free(iter);

Here, we explicitly free the iterator. This is a must, because we will
never free the iterator anymore.

>  }
>  
>  static void clone_local(const char *src_repo, const char *dest_repo)
> diff --git a/dir-iterator.c b/dir-iterator.c
> index de619846f29..857e1d9bdaf 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
>  		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> -			goto error_out;
> +			return ITER_ERROR;

Here, we just return `ITER_ERROR` instead of going `error_out` label
which will call `dir_iterator_abort` function to free the iterator, we
should make the caller free.

>  		if (iter->levels_nr == 0)
> -			goto error_out;
> +			return ITER_ERROR;
>  	}
>  
>  	/* Loop until we find an entry that we can give back to the caller. */
> @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
>  			if (ret < 0) {
>  				if (iter->flags & DIR_ITERATOR_PEDANTIC)
> -					goto error_out;
> +					return ITER_ERROR;
>  				continue;
>  			} else if (ret > 0) {
>  				if (pop_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> +					return ITER_DONE;

Instead of calling `dir_iterator_abort`, we just return `ITER_DONE`.
However, this does not make sense. We break the semantics of
`ITER_DONE`. Let me cite the comment from "iterator.h":

    /*
     * The iterator is exhausted and has been freed.
     */
    #define ITER_DONE -1

However, we don't free the iterator here.

>  				continue;
>  			}
>  
> @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  		} else {
>  			if (level->entries_idx >= level->entries.nr) {
>  				if (pop_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> +					return ITER_DONE;
>  				continue;
>  			}
>  
> @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  
>  		if (prepare_next_entry_data(iter, name)) {
>  			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> -				goto error_out;
> +				return ITER_ERROR;
>  			continue;
>  		}
>  
>  		return ITER_OK;
>  	}
> -
> -error_out:
> -	dir_iterator_abort(dir_iterator);
> -	return ITER_ERROR;
>  }
>  
> -int dir_iterator_abort(struct dir_iterator *dir_iterator)
> +void dir_iterator_free(struct dir_iterator *dir_iterator)

We rename `dir_iterator_abort` to `dir_iterator_free`. And we drop the
return value.

>  {
>  	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
>  
> +	if (!iter)
> +		return;
> +

Make sense, because we have no idea whether the `dir_iterator` is
exhausted. So we need to check whether it is valid.

>  	for (; iter->levels_nr; iter->levels_nr--) {
>  		struct dir_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	free(iter->levels);
>  	strbuf_release(&iter->base.path);
>  	free(iter);
> -	return ITER_DONE;

I am confused that why we don't provide `dir_iterator_release` for dir
iterator? For other ref related iterators, we will rename "*abort" to
"*release" and drop the operation to free the iterator. However, for dir
iterator, we rename its "*abort" to "*free".

I think this does not make sense. It causes inconsistency. Although dir
iterator does _not_ have "peel" method. But it does have "advance" and
"abort" methods just like ref iterators.

I think you have already considered this problem. I guess that's the
reason why in the below comment you typed `dir_iterator_release` instead
of `dir_iterator_free`.

>  }
>  
>  struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
> @@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
>  	return dir_iterator;
>  
>  error_out:
> -	dir_iterator_abort(dir_iterator);
> +	dir_iterator_free(dir_iterator);
>  	errno = saved_errno;
>  	return NULL;
>  }
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 6d438809b6e..01f51f6bac1 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -27,10 +27,8 @@
>   *             goto error_handler;
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
> - *             if (want_to_stop_iteration()) {
> - *                     ok = dir_iterator_abort(iter);
> + *             if (want_to_stop_iteration())
>   *                     break;
> - *             }

Is this correct? If `want_to_stop_iteration()` is true, the `ok` will be
`ITER_OK` and then it breaks the loop to jump to check whether the `ok`
is `ITER_DONE`. Of course it is not, it will call `handle_error`. At
least, we should assign `ok = ITER_DONE`.

>   *
>   *             // Access information about the current path:
>   *             if (S_ISDIR(iter->st.st_mode))
> @@ -39,6 +37,7 @@
>   *
>   *     if (ok != ITER_DONE)
>   *             handle_error();
> + *     dir_iterator_release(iter);

I think this is a typo. Should `dir_iterator_release` be
`dir_iterator_free`?

>   *
>   * Callers are allowed to modify iter->path while they are working,
>   * but they must restore it to its original contents before calling
> @@ -107,11 +106,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
>   */
>  int dir_iterator_advance(struct dir_iterator *iterator);
>  
> -/*
> - * End the iteration before it has been exhausted. Free the
> - * dir_iterator and any associated resources and return ITER_DONE. On
> - * error, free the dir_iterator and return ITER_ERROR.
> - */
> -int dir_iterator_abort(struct dir_iterator *iterator);
> +/* Free the dir_iterator and any associated resources. */
> +void dir_iterator_free(struct dir_iterator *iterator);
>  
>  #endif
> diff --git a/refs.c b/refs.c
> index eaf41421f50..8eff60a2186 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  {
>  	struct strbuf dirname = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
> +	struct ref_iterator *iter = NULL;
>  	struct strset dirnames;
>  	int ret = -1;
>  
> @@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  		strbuf_addch(&dirname, '/');
>  
>  		if (!initial_transaction) {
> -			struct ref_iterator *iter;
>  			int ok;
>  
>  			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
> @@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  
>  				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
>  					    iter->refname, refname);
> -				ref_iterator_abort(iter);
>  				goto cleanup;
>  			}
>  
>  			if (ok != ITER_DONE)
>  				BUG("error while iterating over references");
> +
> +			ref_iterator_free(iter);
> +			iter = NULL;
>  		}
>  
>  		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
> @@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  	strbuf_release(&referent);
>  	strbuf_release(&dirname);
>  	strset_clear(&dirnames);
> +	ref_iterator_free(iter);
>  	return ret;
>  }
>  
> diff --git a/refs/debug.c b/refs/debug.c
> index fbc4df08b43..a9786da4ba1 100644
> --- a/refs/debug.c
> +++ b/refs/debug.c
> @@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return res;
>  }
>  
> -static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)

Here, we rename "*abort" to "*release" for debug ref and call the
corresponding `release` method.

>  {
>  	struct debug_ref_iterator *diter =
>  		(struct debug_ref_iterator *)ref_iterator;
> -	int res = diter->iter->vtable->abort(diter->iter);
> -	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
> -	return res;
> +	diter->iter->vtable->release(diter->iter);
> +	trace_printf_key(&trace_refs, "iterator_abort\n");
>  }
>  
>  static struct ref_iterator_vtable debug_ref_iterator_vtable = {
>  	.advance = debug_ref_iterator_advance,
>  	.peel = debug_ref_iterator_peel,
> -	.abort = debug_ref_iterator_abort,
> +	.release = debug_ref_iterator_release,
>  };
>  
>  static struct ref_iterator *
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 29f08dced40..9511b6f3448 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		return ITER_OK;
>  	}
>  
> -	iter->iter0 = NULL;
> -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> -		ok = ITER_ERROR;
> -

We don't abort the iterator in `advance`. Is this because we want to
reuse this iterator?

>  	return ok;
>  }
>  

[snip]

> @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return ref_iterator_peel(iter->iter0, peeled);
>  }
>  
> -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> +void ref_iterator_free(struct ref_iterator *ref_iterator)
>  {
> -	return ref_iterator->vtable->abort(ref_iterator);
> +	if (ref_iterator) {
> +		ref_iterator->vtable->release(ref_iterator);
> +		/* Help make use-after-free bugs fail quickly: */
> +		ref_iterator->vtable = NULL;
> +		free(ref_iterator);
> +	}
>  }
>  

So, when calling `ref_iterator_free`, we will call corresponding
"release" method to release the resources associated and then we free
the iterator. Would this be too complicated? From my view, we could just
make the `ref_iterator_abort` name unchanged but add a new variable such
as "unsigned int free_iterator". And we change each "abort" callback to
avoid free the iterator. This would be much simpler.

[snip]

> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 02f09e4df88..6457e02c1ea 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		if (++level->index == level->dir->nr) {
>  			/* This level is exhausted; pop up a level */
>  			if (--iter->levels_nr == 0)
> -				return ref_iterator_abort(ref_iterator);
> +				return ITER_DONE;

As I have said, simply return `ITER_DONE` breaks the semantics of the
`ITER_DONE`.

>  
>  			continue;
>  		}
> @@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
>  }
>  
> -static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct cache_ref_iterator *iter =
>  		(struct cache_ref_iterator *)ref_iterator;
> -
>  	free((char *)iter->prefix);
>  	free(iter->levels);
> -	base_ref_iterator_free(ref_iterator);
> -	return ITER_DONE;
>  }
>  
>  static struct ref_iterator_vtable cache_ref_iterator_vtable = {
>  	.advance = cache_ref_iterator_advance,
>  	.peel = cache_ref_iterator_peel,
> -	.abort = cache_ref_iterator_abort
> +	.release = cache_ref_iterator_release,
>  };
>  
>  struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index aaab711bb96..27ff822cf43 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
>   * the next reference and returns ITER_OK. The data pointed at by
>   * refname and oid belong to the iterator; if you want to retain them
>   * after calling ref_iterator_advance() again or calling
> - * ref_iterator_abort(), you must make a copy. When the iteration has
> + * ref_iterator_free(), you must make a copy. When the iteration has
>   * been exhausted, ref_iterator_advance() releases any resources
>   * associated with the iteration, frees the ref_iterator object, and
>   * returns ITER_DONE. If you want to abort the iteration early, call
> - * ref_iterator_abort(), which also frees the ref_iterator object and
> + * ref_iterator_free(), which also frees the ref_iterator object and
>   * any associated resources. If there was an internal error advancing
>   * to the next entry, ref_iterator_advance() aborts the iteration,
>   * frees the ref_iterator, and returns ITER_ERROR.
> @@ -292,10 +292,8 @@ enum do_for_each_ref_flags {
>   *     struct ref_iterator *iter = ...;
>   *
>   *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> - *             if (want_to_stop_iteration()) {
> - *                     ok = ref_iterator_abort(iter);
> + *             if (want_to_stop_iteration())
>   *                     break;
> - *             }

I also think here we have problem, at least we should set `ok =
ITER_DONE`.

>   *
>   *             // Access information about the current reference:
>   *             if (!(iter->flags & REF_ISSYMREF))
> @@ -307,6 +305,7 @@ enum do_for_each_ref_flags {
>   *
>   *     if (ok != ITER_DONE)
>   *             handle_error();
> + *     ref_iterator_free(iter);
>   */
>  struct ref_iterator {
>  	struct ref_iterator_vtable *vtable;
> @@ -333,12 +332,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
>  int ref_iterator_peel(struct ref_iterator *ref_iterator,
>  		      struct object_id *peeled);
>  
> -/*
> - * End the iteration before it has been exhausted, freeing the
> - * reference iterator and any associated resources and returning
> - * ITER_DONE. If the abort itself failed, return ITER_ERROR.
> - */
> -int ref_iterator_abort(struct ref_iterator *ref_iterator);
> +/* Free the reference iterator and any associated resources. */
> +void ref_iterator_free(struct ref_iterator *ref_iterator);
>  
>  /*
>   * An iterator over nothing (its first ref_iterator_advance() call
> @@ -438,13 +433,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
>  void base_ref_iterator_init(struct ref_iterator *iter,
>  			    struct ref_iterator_vtable *vtable);
>  
> -/*
> - * Base class destructor for ref_iterators. Destroy the ref_iterator
> - * part of iter and shallow-free the object. This is meant to be
> - * called only by the destructors of derived classes.
> - */
> -void base_ref_iterator_free(struct ref_iterator *iter);
> -

OK, here we delete `base_ref_iterator_free`. Because we will use
`ref_iterator_free`.

>  /* Virtual function declarations for ref_iterators: */
>  
>  /*
> @@ -463,15 +451,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
>  
>  /*
>   * Implementations of this function should free any resources specific
> - * to the derived class, then call base_ref_iterator_free() to clean
> - * up and free the ref_iterator object.
> + * to the derived class.
>   */
> -typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
> +typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);

So here why we name this function to `release` is that we want to
release any resource related to iterator but we don't free iterator
itself.

[snip]


---

Some my thinking after reading this whole patch:

1. We somehow misuses the "ITER_DONE". We either need to adjust its
meaning (we just delete the iterator is freed part) or introduces a new
state to represent that the iteration is done but we don't release its
resource and free the iterator itself.

2. I don't think we need to make things complicated. From my
understanding, the motivation here is that we don't want to `advance`
callback to call `abort` callback. I want to ask an _important_ question
here: what is the motivation we rename `abort` to `release` in the first
place? As far as I know, we only call this callback in the newly created
"ref_iterator_free". Although release may be more accurate, this change
truly causes overhead of this patch.

3. If the motivation is that we don't want to `advance` callback to call
`abort` callback. I think we could just let the user call `abort`
callback for the following two situations:

    1. We have exhausted the iteration. It returns `ITER_OK`.
    2. We encountered the error, it returns `ITER_ERROR`.

And we give the freedom to the caller. It's their duty to call
`ref_iterator_abort` which cleans the resource and free the iterator.

Writing here, I have always thought that there is a situation that we
just want to release the resources but not want to free the iterator
itself. That's why I am wondering why just add a new variable to do.
However, if we just want to make the lifecycle out, we just delete each
"abort" code where it frees the iterator. And we free the iterator in
the "ref_iterator_abort". Should this be enough?

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 00/14] refs: batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (13 preceding siblings ...)
  2025-02-17 15:50 ` [PATCH 14/14] refs: reuse iterators when determining refname availability Patrick Steinhardt
@ 2025-02-18 17:10 ` brian m. carlson
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 163+ messages in thread
From: brian m. carlson @ 2025-02-18 17:10 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, Jeff King, Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

On 2025-02-17 at 15:50:14, Patrick Steinhardt wrote:
> But more importantly, this refactoring also has a positive effect when
> updating references in a repository with preexisting refs, which I
> consider to be the more realistic scenario. The following benchmark
> creates 10k refs with 100k preexisting refs.
> 
> With the "files" backend we see a modest improvement:
> 
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
>       Time (mean ± σ):     470.1 ms ±   5.4 ms    [User: 104.5 ms, System: 363.1 ms]
>       Range (min … max):   465.7 ms … 484.3 ms    10 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     407.8 ms ±   5.4 ms    [User: 66.0 ms, System: 340.0 ms]
>       Range (min … max):   399.9 ms … 417.6 ms    10 runs
> 
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.15 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
> 
> But with the "reftable" backend we see an almost 5x improvement, where
> it's now ~15x faster than the "files" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
>       Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
>       Range (min … max):   150.5 ms … 158.4 ms    18 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
>       Range (min … max):    29.8 ms …  38.6 ms    71 runs
> 
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

I'm glad to see this performance speedup.  That's a really nice
improvement.

> The series is structured as follows:
> 
>   - Patches 1 to 4 implement the logic to skip ambiguity checks in
>     git-update-ref(1).
> 
>   - Patch 5 and 6 introduce batched checks.
> 
>   - Patch 7 deduplicates the ref prefix checks.
> 
>   - Patch 8 to 14 implement the infrastructure to reseek iterators.
> 
>   - Patch 15 starts to reuse iterators for nested ref checks.

I took a look at this series and I didn't find anything that stood out
to me as a problem.  I will say that the reftable code isn't my forte,
so please don't take this as a formal review, but I am definitely
positive on the series in the general sense.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-17 15:50 ` [PATCH 07/14] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
  2025-02-18 16:52   ` shejialuo
@ 2025-02-18 17:13   ` Karthik Nayak
  2025-02-19 11:52     ` Patrick Steinhardt
  1 sibling, 1 reply; 163+ messages in thread
From: Karthik Nayak @ 2025-02-18 17:13 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 7517 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> The ref and reflog iterators have their lifecycle attached to iteration:
> once the iterator reaches its end, it is automatically released and the
> caller doesn't have to care about that anymore. When the iterator should
> be released before it has been exhausted, callers must explicitly abort
> the iterator via `ref_iterator_abort()`.
>
> This lifecycle is somewhat unusual in the Git codebase and creates two
> problems:
>
>   - Callsites need to be very careful about when exactly they call
>     `ref_iterator_abort()`, as calling the function is only valid when
>     the iterator itself still is. This leads to somewhat awkward calling
>     patterns in some situations.
>
>   - It is impossible to reuse iterators and re-seek them to a different
>     prefix. This feature isn't supported by any iterator implementation
>     except for the reftable iterators anyway, but if it was implemented
>     it would allow us to optimize cases where we need to search for
>     specific references repeatedly by reusing internal state.
>
> Detangle the lifecycle from iteration so that we don't deallocate the
> iterator anymore once it is exhausted. Instead, callers are now expected
> to always call a newly introduce `ref_iterator_free()` function that
> deallocates the iterator and its internal state.
>
> While at it, drop the return value of `ref_iterator_abort()`, which
> wasn't really required by any of the iterator implementations anyway.
> Furthermore, stop calling `base_ref_iterator_free()` in any of the
> backends, but instead call it in `ref_iterator_free()`.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/clone.c              |  2 +
>  dir-iterator.c               | 24 +++++------
>  dir-iterator.h               | 13 ++----
>  refs.c                       |  7 +++-
>  refs/debug.c                 |  9 ++---
>  refs/files-backend.c         | 36 +++++------------
>  refs/iterator.c              | 95 ++++++++++++++------------------------------
>  refs/packed-backend.c        | 27 ++++++-------
>  refs/ref-cache.c             |  9 ++---
>  refs/refs-internal.h         | 31 +++++----------
>  refs/reftable-backend.c      | 34 ++++------------
>  t/helper/test-dir-iterator.c |  1 +
>  12 files changed, 99 insertions(+), 189 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index fd001d800c6..ac3e84b2b18 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		strbuf_setlen(src, src_len);
>  		die(_("failed to iterate over '%s'"), src->buf);
>  	}
> +
> +	dir_iterator_free(iter);
>  }
>

A bit puzzled to see `dir_iterator_*` change here, I'm assuming it's
linked to the 'files-backend' and perhaps similar to the changes
mentioned about `ref_iterator_*` in the commit message. Would be nice to
call out in the commit message too.

[snip]

> @@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  		} else {
>  			if (level->entries_idx >= level->entries.nr) {
>  				if (pop_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> +					return ITER_DONE;
>  				continue;
>  			}
>
> @@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>
>  		if (prepare_next_entry_data(iter, name)) {
>  			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
> -				goto error_out;
> +				return ITER_ERROR;
>  			continue;
>  		}
>
>  		return ITER_OK;
>  	}
> -
> -error_out:
> -	dir_iterator_abort(dir_iterator);
> -	return ITER_ERROR;
>  }

Okay yeah, we're getting rid of `dir_iterator_abort` so potentially add
`dir_iterator_free` below

>
> -int dir_iterator_abort(struct dir_iterator *dir_iterator)
> +void dir_iterator_free(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
>
> +	if (!iter)
> +		return;
> +
>  	for (; iter->levels_nr; iter->levels_nr--) {
>  		struct dir_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	free(iter->levels);
>  	strbuf_release(&iter->base.path);
>  	free(iter);
> -	return ITER_DONE;
>  }
>
>  struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)

Okay this makes sense!

[snip]

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 29f08dced40..9511b6f3448 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		return ITER_OK;
>  	}
>
> -	iter->iter0 = NULL;
> -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> -		ok = ITER_ERROR;
> -
>

Since we're explicitly going to call `ref_iterator_free`, this makes sense.

>  	return ok;
>  }
>
> @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	return ref_iterator_peel(iter->iter0, peeled);
>  }
>
> -static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct files_ref_iterator *iter =
>  		(struct files_ref_iterator *)ref_iterator;
> -	int ok = ITER_DONE;
> -
> -	if (iter->iter0)
> -		ok = ref_iterator_abort(iter->iter0);
> -
> -	base_ref_iterator_free(ref_iterator);
> -	return ok;
> +	ref_iterator_free(iter->iter0);
>  }
>

I like how much more cleaner it looks now.

[snip]

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index a7b6f74b6e3..38a1956d1a8 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		return ITER_OK;
>  	}
>
> -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> -		ok = ITER_ERROR;
> -
>  	return ok;
>  }
>

The merged_iterator is used to combine the files and packed backend
iterators to provide a uniform view over them. Likewise the changes here
seem similar too.

> @@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  	}
>  }
>
> -static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
> +static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct packed_ref_iterator *iter =
>  		(struct packed_ref_iterator *)ref_iterator;
> -	int ok = ITER_DONE;
> -
>  	strbuf_release(&iter->refname_buf);
>  	free(iter->jump);
>  	release_snapshot(iter->snapshot);
> -	base_ref_iterator_free(ref_iterator);
> -	return ok;
>  }
>
>  static struct ref_iterator_vtable packed_ref_iterator_vtable = {
>  	.advance = packed_ref_iterator_advance,
>  	.peel = packed_ref_iterator_peel,
> -	.abort = packed_ref_iterator_abort
> +	.release = packed_ref_iterator_release,
>  };
>
>  static int jump_list_entry_cmp(const void *va, const void *vb)
> @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
>  	 */
>  	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
>  					 DO_FOR_EACH_INCLUDE_BROKEN);
> -	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
> +	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
> +		ref_iterator_free(iter);

Nit: Since we don't return early here, wouldn't the `ref_iterator_free`
at the end of the function be sufficient? I think the only early return
when `iter == NULL` is towards the end of the function, where it might
be better to add a `goto error`.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-18 16:52   ` shejialuo
@ 2025-02-19 11:52     ` Patrick Steinhardt
  2025-02-19 12:41       ` shejialuo
  0 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 11:52 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 12:52:23AM +0800, shejialuo wrote:
> On Mon, Feb 17, 2025 at 04:50:21PM +0100, Patrick Steinhardt wrote:
> > The ref and reflog iterators have their lifecycle attached to iteration:
> > once the iterator reaches its end, it is automatically released and the
> > caller doesn't have to care about that anymore. When the iterator should
> > be released before it has been exhausted, callers must explicitly abort
> > the iterator via `ref_iterator_abort()`.
> > 
> > This lifecycle is somewhat unusual in the Git codebase and creates two
> > problems:
> > 
> >   - Callsites need to be very careful about when exactly they call
> >     `ref_iterator_abort()`, as calling the function is only valid when
> >     the iterator itself still is. This leads to somewhat awkward calling
> >     patterns in some situations.
> > 
> 
> In what situations, the iterator has been disappeared when we call
> `ref_iterator_abort`? Why is this awkward? When reading, I am really
> curious about this.

It leads to patterns where you have to call `ref_iterator_abort()`, but
only in a subset of code paths. You need to be really careful about the
lifetime of the iterator, and I had several occasions where I was doing
the wrong thing because I missed this subtlety. Compared to that, it is
trivial (and less code as demonstrated by the patch) to unconditionally
call `ref_iterator_free()`.

> >   - It is impossible to reuse iterators and re-seek them to a different
> >     prefix. This feature isn't supported by any iterator implementation
> >     except for the reftable iterators anyway, but if it was implemented
> >     it would allow us to optimize cases where we need to search for
> >     specific references repeatedly by reusing internal state.
> > 
> 
> So, the reason why we cannot reuse the iterator is that we will
> deallocate the iterator? So, below we want to detangle the lifecycle. I
> don't know whether my understanding is correct.

Yup, your understanding is correct. A deallocated iterator cannot be
reused.

> > Detangle the lifecycle from iteration so that we don't deallocate the
> > iterator anymore once it is exhausted. Instead, callers are now expected
> > to always call a newly introduce `ref_iterator_free()` function that
> > deallocates the iterator and its internal state.
> > 
> 
> A design question: why do not just introduce a variable, for example,
> `unsigned int free` to indicate whether we need to free the iterator?

Because that would make the code even more subtle than it already is.
Doing things unconditionally is always easier to reason about than doing
them conditionally.

> > diff --git a/dir-iterator.c b/dir-iterator.c
> > index de619846f29..857e1d9bdaf 100644
> > --- a/dir-iterator.c
> > +++ b/dir-iterator.c
> > @@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
> >  			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
> >  			if (ret < 0) {
> >  				if (iter->flags & DIR_ITERATOR_PEDANTIC)
> > -					goto error_out;
> > +					return ITER_ERROR;
> >  				continue;
> >  			} else if (ret > 0) {
> >  				if (pop_level(iter) == 0)
> > -					return dir_iterator_abort(dir_iterator);
> > +					return ITER_DONE;
> 
> Instead of calling `dir_iterator_abort`, we just return `ITER_DONE`.
> However, this does not make sense. We break the semantics of
> `ITER_DONE`. Let me cite the comment from "iterator.h":
> 
>     /*
>      * The iterator is exhausted and has been freed.
>      */
>     #define ITER_DONE -1
> 
> However, we don't free the iterator here.

Oh, yeah, I'll have to adapt that comment, thanks for pointing it out.

> > @@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
> >  	free(iter->levels);
> >  	strbuf_release(&iter->base.path);
> >  	free(iter);
> > -	return ITER_DONE;
> 
> I am confused that why we don't provide `dir_iterator_release` for dir
> iterator? For other ref related iterators, we will rename "*abort" to
> "*release" and drop the operation to free the iterator. However, for dir
> iterator, we rename its "*abort" to "*free".
> 
> I think this does not make sense. It causes inconsistency. Although dir
> iterator does _not_ have "peel" method. But it does have "advance" and
> "abort" methods just like ref iterators.
> 
> I think you have already considered this problem. I guess that's the
> reason why in the below comment you typed `dir_iterator_release` instead
> of `dir_iterator_free`.

The `dir_iterator` itself is already being inconsistent with every other
iterator that we have because it is not a `ref_iterator`. It is _used_
to implement ref iterators, but doesn't provide the same interface. And
because of that we cannot rely on `ref_iterator_free()` to free it, but
must instead provide `dir_iterator_free()` to do so.

I have amended the commit message to explain why this one is special.

> > diff --git a/dir-iterator.h b/dir-iterator.h
> > index 6d438809b6e..01f51f6bac1 100644
> > --- a/dir-iterator.h
> > +++ b/dir-iterator.h
> > @@ -27,10 +27,8 @@
> >   *             goto error_handler;
> >   *
> >   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
> > - *             if (want_to_stop_iteration()) {
> > - *                     ok = dir_iterator_abort(iter);
> > + *             if (want_to_stop_iteration())
> >   *                     break;
> > - *             }
> 
> Is this correct? If `want_to_stop_iteration()` is true, the `ok` will be
> `ITER_OK` and then it breaks the loop to jump to check whether the `ok`
> is `ITER_DONE`. Of course it is not, it will call `handle_error`. At
> least, we should assign `ok = ITER_DONE`.

True, fixed.

> >   *
> >   *             // Access information about the current path:
> >   *             if (S_ISDIR(iter->st.st_mode))
> > @@ -39,6 +37,7 @@
> >   *
> >   *     if (ok != ITER_DONE)
> >   *             handle_error();
> > + *     dir_iterator_release(iter);
> 
> I think this is a typo. Should `dir_iterator_release` be
> `dir_iterator_free`?

Yup, fixed.

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 29f08dced40..9511b6f3448 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -919,10 +919,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		return ITER_OK;
> >  	}
> >  
> > -	iter->iter0 = NULL;
> > -	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
> > -		ok = ITER_ERROR;
> > -
> 
> We don't abort the iterator in `advance`. Is this because we want to
> reuse this iterator?

Exactly, this is basically the gist of this whole patch.

> > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
> >  	return ref_iterator_peel(iter->iter0, peeled);
> >  }
> >  
> > -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> > +void ref_iterator_free(struct ref_iterator *ref_iterator)
> >  {
> > -	return ref_iterator->vtable->abort(ref_iterator);
> > +	if (ref_iterator) {
> > +		ref_iterator->vtable->release(ref_iterator);
> > +		/* Help make use-after-free bugs fail quickly: */
> > +		ref_iterator->vtable = NULL;
> > +		free(ref_iterator);
> > +	}
> >  }
> >  
> 
> So, when calling `ref_iterator_free`, we will call corresponding
> "release" method to release the resources associated and then we free
> the iterator. Would this be too complicated? From my view, we could just
> make the `ref_iterator_abort` name unchanged but add a new variable such
> as "unsigned int free_iterator". And we change each "abort" callback to
> avoid free the iterator. This would be much simpler.

I think adding a separate variable to track whether or not things should
be freed would make things way more complicated. I would claim the
opposite: the fact that the patch removes 100 lines of code demonstrates
quite neatly that the new design is way simpler and needs less logic.

It's also simpler to reason about from my perspective: you allocate an
iterator, you free it. No conditionals, no nothing.

> [snip]
> 
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 02f09e4df88..6457e02c1ea 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		if (++level->index == level->dir->nr) {
> >  			/* This level is exhausted; pop up a level */
> >  			if (--iter->levels_nr == 0)
> > -				return ref_iterator_abort(ref_iterator);
> > +				return ITER_DONE;
> 
> As I have said, simply return `ITER_DONE` breaks the semantics of the
> `ITER_DONE`.

It doesn't: `ITER_DONE` indicates that the iterator has been exhausted.
The current comment is stale though, as it claims that the iterator will
also have been free'd.

> > diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> > index aaab711bb96..27ff822cf43 100644
> > --- a/refs/refs-internal.h
> > +++ b/refs/refs-internal.h
> > @@ -292,10 +292,8 @@ enum do_for_each_ref_flags {
> >   *     struct ref_iterator *iter = ...;
> >   *
> >   *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> > - *             if (want_to_stop_iteration()) {
> > - *                     ok = ref_iterator_abort(iter);
> > + *             if (want_to_stop_iteration())
> >   *                     break;
> > - *             }
> 
> I also think here we have problem, at least we should set `ok =
> ITER_DONE`.

Yup, same as in the other comment indeed.

> Some my thinking after reading this whole patch:
> 
> 1. We somehow misuses the "ITER_DONE". We either need to adjust its
> meaning (we just delete the iterator is freed part) or introduces a new
> state to represent that the iteration is done but we don't release its
> resource and free the iterator itself.

Yup, addressed.

> 2. I don't think we need to make things complicated. From my
> understanding, the motivation here is that we don't want to `advance`
> callback to call `abort` callback. I want to ask an _important_ question
> here: what is the motivation we rename `abort` to `release` in the first
> place? As far as I know, we only call this callback in the newly created
> "ref_iterator_free". Although release may be more accurate, this change
> truly causes overhead of this patch.

We have to touch up all of these functions anyway, so renaming them at
the same point doesn't feel like it adds any more complexity.

> 3. If the motivation is that we don't want to `advance` callback to call
> `abort` callback. I think we could just let the user call `abort`
> callback for the following two situations:
> 
>     1. We have exhausted the iteration. It returns `ITER_OK`.

This is impossible because the caller wouldn't be able to discern an
exhausted iterator from an iterator that still has entries. We have to
discern `ITER_OK` and `ITER_DONE`.

>     2. We encountered the error, it returns `ITER_ERROR`.

Yup.

> And we give the freedom to the caller. It's their duty to call
> `ref_iterator_abort` which cleans the resource and free the iterator.

Yup.

> Writing here, I have always thought that there is a situation that we
> just want to release the resources but not want to free the iterator
> itself. That's why I am wondering why just add a new variable to do.
> However, if we just want to make the lifecycle out, we just delete each
> "abort" code where it frees the iterator. And we free the iterator in
> the "ref_iterator_abort". Should this be enough?

As mentioned, I don't think a new variable would lead to a simplified
architecture. With re-seekable iterators the caller is the one who needs
to control whether the iterator should or should not be free'd, as they
are the only one who knows whether they want to reuse it. So making it
the responsibility of the callers to release the iterator is the proper
way to do it.

I also don't quite see the complexity argument. The patch rather clearly
shows that we're _reducing_ complexity as it allows us to drop around a
100 lines of code. We can stop worrying about whether or not we have to
call `ref_iterator_abort()` and don't have to worry about any conditions
that leak from the iterator subsystem to the callers. We just free the
iterator once we're done with it and call it a day.

Thanks for your input!

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-18 17:13   ` Karthik Nayak
@ 2025-02-19 11:52     ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 11:52 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Tue, Feb 18, 2025 at 09:13:55AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index fd001d800c6..ac3e84b2b18 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >  		strbuf_setlen(src, src_len);
> >  		die(_("failed to iterate over '%s'"), src->buf);
> >  	}
> > +
> > +	dir_iterator_free(iter);
> >  }
> >
> 
> A bit puzzled to see `dir_iterator_*` change here, I'm assuming it's
> linked to the 'files-backend' and perhaps similar to the changes
> mentioned about `ref_iterator_*` in the commit message. Would be nice to
> call out in the commit message too.

Yeah, that's the reason. I've added a note to the commit message.

> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index a7b6f74b6e3..38a1956d1a8 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
> >  	 */
> >  	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
> >  					 DO_FOR_EACH_INCLUDE_BROKEN);
> > -	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
> > +	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
> > +		ref_iterator_free(iter);
> 
> Nit: Since we don't return early here, wouldn't the `ref_iterator_free`
> at the end of the function be sufficient? I think the only early return
> when `iter == NULL` is towards the end of the function, where it might
> be better to add a `goto error`.

Yeah, the code here could definitely be improved with the new semantics.
But I was aiming to keep the required refactoring work at callsites to
the bare minimum, so I'd rather prefer to keep this unchanged.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 06/14] refs: stop re-verifying common prefixes for availability
  2025-02-18 16:12   ` Karthik Nayak
@ 2025-02-19 11:52     ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 11:52 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Tue, Feb 18, 2025 at 08:12:05AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > One of the checks done by `refs_verify_refnames_available()` is whether
> > any of the prefixes of a reference already exists. For example, given a
> > reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
> > already exist, and if so we'd abort the transaction.
> >
> > When updating multiple references at once, this check is performed for
> > each of the references individually. Consequently, because references
> > tend to have common prefixes like "refs/heads/" or refs/tags/", we
> > evaluate the availability of these prefixes repeatedly. Naturally this
> > is a waste of compute, as the availability of those prefixes should in
> > general not change in the middle of a transaction. And if it would,
> > backends would notice at a later point in time.
> >
> > Optimize this pattern by storing prefixes in a `strset` so that we can
> > trivially track those prefixes that we have already checked. This leads
> > to a significant speedup when creating many references that all share a
> > common prefix:
> >
> >     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> >       Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
> >       Range (min … max):    60.6 ms …  69.5 ms    38 runs
> >
> >     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
> >       Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
> >       Range (min … max):    38.1 ms …  47.3 ms    61 runs
> >
> >     Summary
> >       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
> >         1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> >
> > Note that the same speedup cannot be observed for the "files" backend
> > because it still performs availability check per reference.
> >
> 
> In the previous commit, you started using the new function in the
> reftable backend, can we not make a similar change to the files backend?

It's quite a bit more intricate in the "files" backend because the
creation of the lockfiles and calls to `refs_verify_refname_available()`
are intertwined with one another:

  - `lock_raw_ref()` verifies availability when it hits either EEXISTS
    or EISDIR to generate error messages. This is probably nothing we
    have to care about too much, as these are irrelevant in the good
    path.

  - `lock_raw_ref()` also verifies availability though in the case where
    it _could_ create the lockfile to check whether it is conflicting
    with any packed refs. This one could potentially be batched. It's a
    curious thing in the first place as we do not have the packed refs
    locked at this point in time, so this check might even be racy.

  - `lock_ref_oid_basic()` also checks availability with packed refs, so
    this is another case where we might batch the checks. But the
    function is only used when copying/renaming references or when
    expiring reflogs, so it won't be called for many refs.

  - We call it in `refs_refname_ref_available()`, which is executed when
    copying/renaming references. Uninteresting due to the same reason as
    the previous entry.

  - We call it in `files_transaction_finish_initial()`. This one should
    be rather trivial to batch. Again though, no locking with packed
    refs, so the checks are racy.

So... it's a bit more complicated here compared to the reftable backend,
and I didn't feel like opening a can of worms with the potentially-racy
checks with the packed backend.

Anyway, I think we still can and probably should use the new mechanism
in two cases:

  - During normal transactions to batch the availability checks with the
    packed backend. I will have to ignore the issue of a potential race,
    but other than that the change is straight forward and the result is
    a slight speedup:

      Benchmark 1: update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD~)
         Time (mean ± σ):     393.4 ms ±   4.0 ms    [User: 64.1 ms, System: 327.5 ms]
         Range (min … max):   387.8 ms … 398.7 ms    10 runs

       Benchmark 2: update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD)
         Time (mean ± σ):     373.3 ms ±   3.4 ms    [User: 48.8 ms, System: 322.7 ms]
         Range (min … max):   368.7 ms … 378.6 ms    10 runs

       Summary
         update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD) ran
           1.05 ± 0.01 times faster than update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD~)

  - During the initial transaction. Here the change is even more trivial
    and we can also fix the race as we eventually lock the packed-refs
    file anyway. This leads to a noticeable speedup when migrating from
    the reftable backend to the files backend:

      Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
        Time (mean ± σ):     980.6 ms ±  10.9 ms    [User: 801.8 ms, System: 172.4 ms]
        Range (min … max):   964.7 ms … 995.3 ms    10 runs

      Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
        Time (mean ± σ):     739.7 ms ±   6.6 ms    [User: 551.9 ms, System: 181.9 ms]
        Range (min … max):   727.9 ms … 747.2 ms    10 runs

      Summary
        migrate reftable:files (refcount = 1000000, revision = HEAD) ran
          1.33 ± 0.02 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

I'll include these changes in the next version, thanks for questioning
why I skipped over the "files" backend.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-19 11:52     ` Patrick Steinhardt
@ 2025-02-19 12:41       ` shejialuo
  2025-02-19 12:59         ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-19 12:41 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 12:52:06PM +0100, Patrick Steinhardt wrote:
> > > @@ -935,23 +931,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
> > >  	return ref_iterator_peel(iter->iter0, peeled);
> > >  }
> > >  
> > > -int ref_iterator_abort(struct ref_iterator *ref_iterator)
> > > +void ref_iterator_free(struct ref_iterator *ref_iterator)
> > >  {
> > > -	return ref_iterator->vtable->abort(ref_iterator);
> > > +	if (ref_iterator) {
> > > +		ref_iterator->vtable->release(ref_iterator);
> > > +		/* Help make use-after-free bugs fail quickly: */
> > > +		ref_iterator->vtable = NULL;
> > > +		free(ref_iterator);
> > > +	}
> > >  }
> > >  
> > 
> > So, when calling `ref_iterator_free`, we will call corresponding
> > "release" method to release the resources associated and then we free
> > the iterator. Would this be too complicated? From my view, we could just
> > make the `ref_iterator_abort` name unchanged but add a new variable such
> > as "unsigned int free_iterator". And we change each "abort" callback to
> > avoid free the iterator. This would be much simpler.
> 
> I think adding a separate variable to track whether or not things should
> be freed would make things way more complicated. I would claim the
> opposite: the fact that the patch removes 100 lines of code demonstrates
> quite neatly that the new design is way simpler and needs less logic.
> 

I agree with you with the current implementation, we don't need to worry
about calling "abort" callback in "advance". And the logic is simpler.
When writing this comment, I have thought that there is a situation that
we just want to call "release" callback. So, that's the reason why I
have asked you why not just add a new variable.

However, we don't call "release" callback expect in `ref_iterator_free`.

[snip]

> > 2. I don't think we need to make things complicated. From my
> > understanding, the motivation here is that we don't want to `advance`
> > callback to call `abort` callback. I want to ask an _important_ question
> > here: what is the motivation we rename `abort` to `release` in the first
> > place? As far as I know, we only call this callback in the newly created
> > "ref_iterator_free". Although release may be more accurate, this change
> > truly causes overhead of this patch.
> 
> We have to touch up all of these functions anyway, so renaming them at
> the same point doesn't feel like it adds any more complexity.
> 

I will explain this in the later.

[snip]

> > Writing here, I have always thought that there is a situation that we
> > just want to release the resources but not want to free the iterator
> > itself. That's why I am wondering why just add a new variable to do.
> > However, if we just want to make the lifecycle out, we just delete each
> > "abort" code where it frees the iterator. And we free the iterator in
> > the "ref_iterator_abort". Should this be enough?
> 
> As mentioned, I don't think a new variable would lead to a simplified
> architecture. With re-seekable iterators the caller is the one who needs
> to control whether the iterator should or should not be free'd, as they
> are the only one who knows whether they want to reuse it. So making it
> the responsibility of the callers to release the iterator is the proper
> way to do it.
> 

Yes, actually I think you have misunderstood my meaning here. I have
abandoned the idea to add a new variable when writing here. After
reading through the patch, I know that your motivation. My point is
"However, ..." part.

> I also don't quite see the complexity argument. The patch rather clearly
> shows that we're _reducing_ complexity as it allows us to drop around a
> 100 lines of code. We can stop worrying about whether or not we have to
> call `ref_iterator_abort()` and don't have to worry about any conditions
> that leak from the iterator subsystem to the callers. We just free the
> iterator once we're done with it and call it a day.
> 

Yes, I agree with you that we truly reduce complexity here. And as you
have said, the user allocate the iterator and free the iterator. With
this, we make call sequence clearer.

But there is one thing I want to argue with. I don't think we need to
rename "abort" callback to "release" and also "ref_iterator_abort" to
"ref_iterator_free" for the following reasons:

1. We never call "release" expect in the "ref_iterator_free" function.
For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
and the original "ref_iterator_abort". We will just call the registered
callback "advance", "peel" or "abort" via virtual table. I somehow think
we should follow this pattern. But I don't know actually.
2. When I read the patch yesterday, I really wonder what is the
difference between "release" and "free". Why do we only change the
"ref_iterator_abort" to "ref_iterator_free" but for the callback, we
rename "abort" to "release". I know that you want to distinguish to
emphasis that we won't free the iterator but only release its resource
for ref iterator. But could abort also mean this?

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-19 12:41       ` shejialuo
@ 2025-02-19 12:59         ` Patrick Steinhardt
  2025-02-19 13:06           ` shejialuo
  0 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 12:59 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 08:41:03PM +0800, shejialuo wrote:
> But there is one thing I want to argue with. I don't think we need to
> rename "abort" callback to "release" and also "ref_iterator_abort" to
> "ref_iterator_free" for the following reasons:
> 
> 1. We never call "release" expect in the "ref_iterator_free" function.
> For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
> and the original "ref_iterator_abort". We will just call the registered
> callback "advance", "peel" or "abort" via virtual table. I somehow think
> we should follow this pattern. But I don't know actually.
> 2. When I read the patch yesterday, I really wonder what is the
> difference between "release" and "free". Why do we only change the
> "ref_iterator_abort" to "ref_iterator_free" but for the callback, we
> rename "abort" to "release". I know that you want to distinguish to
> emphasis that we won't free the iterator but only release its resource
> for ref iterator. But could abort also mean this?

The difference between "release" and "free" is explicitly documented in
our CodingGuidelines. Quoting the relevant parts:

    - `S_release()` releases a structure's contents without freeing the
      structure.

    - `S_free()` releases a structure's contents and frees the
      structure.

So following these coding guidelines, we have to call the underlying
implementations that are specific to the iterators `release()` because
they don't free the iterator itself. And because the generic part _does_
free the iterator itself in addition to releasing its state, it has to
be called `free()`.

Regarding the question why to even rename `ref_iterator_abort()` itself:
this is done to avoid confusion going forward. Previously it really only
had to be called when you actually wanted to abort an ongoing iteration
over its yielded references. This is not the case anymore, and now you
have to call it unconditionally after you're done with the iterator. So
while the naming previously made sense, now it doesn't anymore.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-19 12:59         ` Patrick Steinhardt
@ 2025-02-19 13:06           ` shejialuo
  2025-02-19 13:17             ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-19 13:06 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> On Wed, Feb 19, 2025 at 08:41:03PM +0800, shejialuo wrote:
> > But there is one thing I want to argue with. I don't think we need to
> > rename "abort" callback to "release" and also "ref_iterator_abort" to
> > "ref_iterator_free" for the following reasons:
> > 
> > 1. We never call "release" expect in the "ref_iterator_free" function.
> > For other exposed functions "ref_iterator_advance", "ref_iterator_peel"
> > and the original "ref_iterator_abort". We will just call the registered
> > callback "advance", "peel" or "abort" via virtual table. I somehow think
> > we should follow this pattern. But I don't know actually.
> > 2. When I read the patch yesterday, I really wonder what is the
> > difference between "release" and "free". Why do we only change the
> > "ref_iterator_abort" to "ref_iterator_free" but for the callback, we
> > rename "abort" to "release". I know that you want to distinguish to
> > emphasis that we won't free the iterator but only release its resource
> > for ref iterator. But could abort also mean this?
> 
> The difference between "release" and "free" is explicitly documented in
> our CodingGuidelines. Quoting the relevant parts:
> 
>     - `S_release()` releases a structure's contents without freeing the
>       structure.
> 
>     - `S_free()` releases a structure's contents and frees the
>       structure.
> 
> So following these coding guidelines, we have to call the underlying
> implementations that are specific to the iterators `release()` because
> they don't free the iterator itself. And because the generic part _does_
> free the iterator itself in addition to releasing its state, it has to
> be called `free()`.
> 

Make sense.

> Regarding the question why to even rename `ref_iterator_abort()` itself:
> this is done to avoid confusion going forward. Previously it really only
> had to be called when you actually wanted to abort an ongoing iteration
> over its yielded references. This is not the case anymore, and now you
> have to call it unconditionally after you're done with the iterator. So
> while the naming previously made sense, now it doesn't anymore.
> 

Good point, I didn't realise this part. Thanks for the detailed
explanation. I will continue to review the later patches. However, I
won't touch the oid part, because I am not familiar with this. By the
way, I think we miss out one thing in this patch:

We forget to free the dir iterator defined in the
"files-backend.c::files_fsck_refs_dir". I have just remembered that I
use dir iterator when checking the ref consistency.

Thanks,
Jialuo

> Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-19 13:06           ` shejialuo
@ 2025-02-19 13:17             ` Patrick Steinhardt
  2025-02-19 13:20               ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:17 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> > Regarding the question why to even rename `ref_iterator_abort()` itself:
> > this is done to avoid confusion going forward. Previously it really only
> > had to be called when you actually wanted to abort an ongoing iteration
> > over its yielded references. This is not the case anymore, and now you
> > have to call it unconditionally after you're done with the iterator. So
> > while the naming previously made sense, now it doesn't anymore.
> > 
> 
> Good point, I didn't realise this part. Thanks for the detailed
> explanation. I will continue to review the later patches. However, I
> won't touch the oid part, because I am not familiar with this. By the
> way, I think we miss out one thing in this patch:
> 
> We forget to free the dir iterator defined in the
> "files-backend.c::files_fsck_refs_dir". I have just remembered that I
> use dir iterator when checking the ref consistency.

Hm, good point. Why doesn't CI complain about this leak...? I'll
investigate, thanks for the hint!

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-19 13:17             ` Patrick Steinhardt
@ 2025-02-19 13:20               ` Patrick Steinhardt
  2025-02-19 13:23                 ` shejialuo
  0 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:20 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:17:07PM +0100, Patrick Steinhardt wrote:
> On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote:
> > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> > > Regarding the question why to even rename `ref_iterator_abort()` itself:
> > > this is done to avoid confusion going forward. Previously it really only
> > > had to be called when you actually wanted to abort an ongoing iteration
> > > over its yielded references. This is not the case anymore, and now you
> > > have to call it unconditionally after you're done with the iterator. So
> > > while the naming previously made sense, now it doesn't anymore.
> > > 
> > 
> > Good point, I didn't realise this part. Thanks for the detailed
> > explanation. I will continue to review the later patches. However, I
> > won't touch the oid part, because I am not familiar with this. By the
> > way, I think we miss out one thing in this patch:
> > 
> > We forget to free the dir iterator defined in the
> > "files-backend.c::files_fsck_refs_dir". I have just remembered that I
> > use dir iterator when checking the ref consistency.
> 
> Hm, good point. Why doesn't CI complain about this leak...? I'll
> investigate, thanks for the hint!

Wait, no, I had been looking at the wrong branch. We do free the
iterator:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 11a620ea11a..859f1c11941 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
                ret = error(_("failed to iterate over '%s'"), sb.buf);

 out:
+       dir_iterator_free(iter);
        strbuf_release(&sb);
        strbuf_release(&refname);
        return ret;

Patrick


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH 07/14] refs/iterator: separate lifecycle from iteration
  2025-02-19 13:20               ` Patrick Steinhardt
@ 2025-02-19 13:23                 ` shejialuo
  0 siblings, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-02-19 13:23 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:20:15PM +0100, Patrick Steinhardt wrote:
> On Wed, Feb 19, 2025 at 02:17:07PM +0100, Patrick Steinhardt wrote:
> > On Wed, Feb 19, 2025 at 09:06:58PM +0800, shejialuo wrote:
> > > On Wed, Feb 19, 2025 at 01:59:13PM +0100, Patrick Steinhardt wrote:
> > > > Regarding the question why to even rename `ref_iterator_abort()` itself:
> > > > this is done to avoid confusion going forward. Previously it really only
> > > > had to be called when you actually wanted to abort an ongoing iteration
> > > > over its yielded references. This is not the case anymore, and now you
> > > > have to call it unconditionally after you're done with the iterator. So
> > > > while the naming previously made sense, now it doesn't anymore.
> > > > 
> > > 
> > > Good point, I didn't realise this part. Thanks for the detailed
> > > explanation. I will continue to review the later patches. However, I
> > > won't touch the oid part, because I am not familiar with this. By the
> > > way, I think we miss out one thing in this patch:
> > > 
> > > We forget to free the dir iterator defined in the
> > > "files-backend.c::files_fsck_refs_dir". I have just remembered that I
> > > use dir iterator when checking the ref consistency.
> > 
> > Hm, good point. Why doesn't CI complain about this leak...? I'll
> > investigate, thanks for the hint!
> 
> Wait, no, I had been looking at the wrong branch. We do free the
> iterator:
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 11a620ea11a..859f1c11941 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
>                 ret = error(_("failed to iterate over '%s'"), sb.buf);
> 
>  out:
> +       dir_iterator_free(iter);
>         strbuf_release(&sb);
>         strbuf_release(&refname);
>         return ret;
> 
> Patrick

Oh, my mistake. I omit that part during review... Sorry here.



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v2 00/16] refs: batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (14 preceding siblings ...)
  2025-02-18 17:10 ` [PATCH 00/14] refs: batch refname availability checks brian m. carlson
@ 2025-02-19 13:23 ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
                     ` (16 more replies)
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                   ` (3 subsequent siblings)
  19 siblings, 17 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Hi,

this patch series has been inspired by brian's report that the reftable
backend is significantly slower when writing many references compared to
the files backend. As explained in that thread, the underlying issue is
the design of tombstone references: when we first delete all references
in a repository and then recreate them, we still have all the tombstones
and thus we need to churn through all of them to figure out that they
have been deleted in the first place. The files backend does not have
this issue.

I consider the benchmark itself to be kind of broken, as it stems from
us deleting all refs and then recreating them. And if you pack refs in
between then the "reftable" backend outperforms the "files" backend.

But there are a couple of opportunities here anyway. While we cannot
make the underlying issue of tombstones being less efficient go away,
this has prompted me to have a deeper look at where we spend all the
time. There are three ideas in this series:

  - git-update-ref(1) performs ambiguity checks for any full-size object
    ID, which triggers a lot of reads. This is somewhat pointless though
    given that the manpage explicitly points out that the command is
    about object IDs, even though it does know to parse refs. But being
    part of plumbing, emitting the warning here does not make a ton of
    sense, and favoring object IDs over references in these cases is the
    obvious thing to do anyway.

  - For each ref "refs/heads/bar", we need to verify that neither
    "refs/heads" nor "refs" exists. This was repeated for every refname,
    but because most refnames use common prefixes this made us re-check
    a lot of prefixes. This is addressed by using a `strset` of already
    checked prefixes.

  - For each ref "refs/heads/bar", we need to verify that no ref
    "refs/heads/bar/*" exists. We always created a new ref iterator for
    this check, which requires us to discard all internal state and then
    recreate it. The reftable library has already been refactored though
    to have reseekable iterators, so we backfill this functionality to
    all the other iterators and then reuse the iterator.

With the (somewhat broken) benchmark we see a small speedup with the
"files" backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
      Range (min … max):   232.2 ms … 236.9 ms    10 runs

    Benchmark 2: update-ref (refformat = files, revision = HEAD)
      Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
      Range (min … max):   181.1 ms … 187.0 ms    10 runs

    Summary
      update-ref (refformat = files, revision = HEAD) ran
        1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)

And a huge speedup with the "reftable" backend:

    Benchmark 1: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
      Range (min … max):   16.785 s … 16.982 s    10 runs

    Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
      Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
      Range (min … max):    2.215 s …  2.244 s    10 runs

    Summary
      update-ref (refformat = reftable, revision = HEAD) ran
        7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)

We're still not up to speed with the "files" backend, but considerably
better. Given that this is an extreme edge case and not reflective of
the general case I'm okay with this result for now.

But more importantly, this refactoring also has a positive effect when
updating references in a repository with preexisting refs, which I
consider to be the more realistic scenario. The following benchmark
creates 10k refs with 100k preexisting refs.

With the "files" backend we see a modest improvement:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
      Range (min … max):   465.4 ms … 496.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
      Range (min … max):   376.5 ms … 403.1 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)

But with the "reftable" backend we see an almost 5x improvement, where
it's now ~15x faster than the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
      Range (min … max):   150.5 ms … 158.4 ms    18 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
      Range (min … max):    29.8 ms …  38.6 ms    71 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

The series is structured as follows:

  - Patches 1 to 4 implement the logic to skip ambiguity checks in
    git-update-ref(1).

  - Patch 5 to 8 introduce batched checks.

  - Patch 9 deduplicates the ref prefix checks.

  - Patch 10 to 16 implement the infrastructure to reseek iterators.

  - Patch 17 starts to reuse iterators for nested ref checks.

Changes in v2:
  - Point out why we also have to touch up the `dir_iterator`.
  - Fix up the comment explaining `ITER_DONE`.
  - Fix up comments that show usage patterns of the ref and dir iterator
    interfaces.
  - Start batching availability checks in the "files" backend, as well.
  - Improve the commit message that drops the ambiguity check so that we
    also point to 25fba78d36b (cat-file: disable object/refname
    ambiguity check for batch mode, 2013-07-12).
  - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im

Thanks!

Patrick

[1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

---
Patrick Steinhardt (16):
      object-name: introduce `repo_get_oid_with_flags()`
      object-name: allow skipping ambiguity checks in `get_oid()` family
      builtin/update-ref: skip ambiguity checks when parsing object IDs
      refs: introduce function to batch refname availability checks
      refs/reftable: batch refname availability checks
      refs/files: batch refname availability checks for normal transactions
      refs/files: batch refname availability checks for initial transactions
      refs: stop re-verifying common prefixes for availability
      refs/iterator: separate lifecycle from iteration
      refs/iterator: provide infrastructure to re-seek iterators
      refs/iterator: implement seeking for merged iterators
      refs/iterator: implement seeking for reftable iterators
      refs/iterator: implement seeking for ref-cache iterators
      refs/iterator: implement seeking for `packed-ref` iterators
      refs/iterator: implement seeking for "files" iterators
      refs: reuse iterators when determining refname availability

 builtin/clone.c              |   2 +
 builtin/update-ref.c         |  12 ++-
 dir-iterator.c               |  24 +++---
 dir-iterator.h               |  11 +--
 hash.h                       |   1 +
 iterator.h                   |   2 +-
 object-name.c                |  18 +++--
 object-name.h                |   6 ++
 refs.c                       | 186 ++++++++++++++++++++++++++-----------------
 refs.h                       |  12 +++
 refs/debug.c                 |  20 +++--
 refs/files-backend.c         | 117 +++++++++++++++++----------
 refs/iterator.c              | 145 +++++++++++++++++----------------
 refs/packed-backend.c        |  89 ++++++++++++---------
 refs/ref-cache.c             |  83 +++++++++++--------
 refs/refs-internal.h         |  52 +++++++-----
 refs/reftable-backend.c      |  85 +++++++++++---------
 t/helper/test-dir-iterator.c |   1 +
 18 files changed, 519 insertions(+), 347 deletions(-)

Range-diff versus v1:

 1:  313d86f4274 =  1:  22de0f21a9f object-name: introduce `repo_get_oid_with_flags()`
 2:  a0ca8e62a81 !  2:  4aa573a50d4 object-name: allow skipping ambiguity checks in `get_oid()` family
    @@ hash.h: struct object_id {
      #define GET_OID_ONLY_TO_DIE    04000
      #define GET_OID_REQUIRE_PATH  010000
      #define GET_OID_HASH_ANY      020000
    -+#define GET_OID_HASH_SKIP_AMBIGUITY_CHECK 040000
    ++#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
      
      #define GET_OID_DISAMBIGUATORS \
      	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
    @@ object-name.c: static int get_oid_basic(struct repository *r, const char *str, i
      
      	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
     -		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
    -+		if (!(flags & GET_OID_HASH_SKIP_AMBIGUITY_CHECK) &&
    ++		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
     +		    repo_settings_get_warn_ambiguous_refs(r) &&
     +		    warn_on_object_refname_ambiguity) {
      			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 3:  7029057b07f !  3:  073392c4371 builtin/update-ref: skip ambiguity checks when parsing object IDs
    @@ Commit message
         object ID that we parse according to the DWIM rules. This effect can be
         seen both with the "files" and "reftable" backend.
     
    +    The issue is not unique to git-update-ref(1), but was also an issue in
    +    git-cat-file(1), where it was addressed by disabling the ambiguity check
    +    in 25fba78d36b (cat-file: disable object/refname ambiguity check for
    +    batch mode, 2013-07-12).
    +
         Disable the warning in git-update-ref(1), which provides a significant
         speedup with both backends. The following benchmark creates 10000 new
         references with a 100000 preexisting refs with the "files" backend:
    @@ builtin/update-ref.c: static int parse_next_oid(const char **next, const char *e
      		if (arg.len) {
     -			if (repo_get_oid(the_repository, arg.buf, oid))
     +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
    -+						    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
    ++						    GET_OID_SKIP_AMBIGUITY_CHECK))
      				goto invalid;
      		} else {
      			/* Without -z, an empty value means all zeros: */
    @@ builtin/update-ref.c: static int parse_next_oid(const char **next, const char *e
      		if (arg.len) {
     -			if (repo_get_oid(the_repository, arg.buf, oid))
     +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
    -+						    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
    ++						    GET_OID_SKIP_AMBIGUITY_CHECK))
      				goto invalid;
      		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
      			/* With -z, treat an empty value as all zeros: */
    @@ builtin/update-ref.c: int cmd_update_ref(int argc,
      		oldval = argv[2];
     -		if (repo_get_oid(the_repository, value, &oid))
     +		if (repo_get_oid_with_flags(the_repository, value, &oid,
    -+					    GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
    ++					    GET_OID_SKIP_AMBIGUITY_CHECK))
      			die("%s: not a valid SHA1", value);
      	}
      
    @@ builtin/update-ref.c: int cmd_update_ref(int argc,
      			oidclr(&oldoid, the_repository->hash_algo);
     -		else if (repo_get_oid(the_repository, oldval, &oldoid))
     +		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
    -+						 GET_OID_HASH_SKIP_AMBIGUITY_CHECK))
    ++						 GET_OID_SKIP_AMBIGUITY_CHECK))
      			die("%s: not a valid old SHA1", oldval);
      	}
      
 4:  768cb058d6e =  4:  e61aab15188 refs: introduce function to batch refname availability checks
 5:  6eed3559d68 !  5:  ef3df1044b4 refs/reftable: start using `refs_verify_refnames_available()`
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    refs/reftable: start using `refs_verify_refnames_available()`
    +    refs/reftable: batch refname availability checks
     
         Refactor the "reftable" backend to batch the availability check for
         refnames. This does not yet have an effect on performance as we
 -:  ----------- >  6:  822bfc7bdee refs/files: batch refname availability checks for normal transactions
 -:  ----------- >  7:  f5dc7eaa97e refs/files: batch refname availability checks for initial transactions
 6:  5d78f40c460 !  8:  20a74a89045 refs: stop re-verifying common prefixes for availability
    @@ Commit message
     
         Optimize this pattern by storing prefixes in a `strset` so that we can
         trivially track those prefixes that we have already checked. This leads
    -    to a significant speedup when creating many references that all share a
    -    common prefix:
    +    to a significant speedup with the "reftable" backend when creating many
    +    references that all share a common prefix:
     
             Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
               Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
    @@ Commit message
               update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
                 1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
     
    -    Note that the same speedup cannot be observed for the "files" backend
    -    because it still performs availability check per reference.
    +    For the "files" backend we see an improvement, but a much smaller one:
    +
    +        Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
    +          Time (mean ± σ):     395.8 ms ±   5.3 ms    [User: 63.6 ms, System: 330.5 ms]
    +          Range (min … max):   387.0 ms … 404.6 ms    10 runs
    +
    +        Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
    +          Time (mean ± σ):     386.0 ms ±   4.0 ms    [User: 51.5 ms, System: 332.8 ms]
    +          Range (min … max):   380.8 ms … 392.6 ms    10 runs
    +
    +        Summary
    +          update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
    +            1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
    +
    +    This change also leads to a modest improvement when writing references
    +    with "initial" semantics, for example when migrating references. The
    +    following benchmarks are migrating 1m references from the "reftable" to
    +    the "files" backend:
    +
    +        Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
    +          Time (mean ± σ):     836.6 ms ±   5.6 ms    [User: 645.2 ms, System: 185.2 ms]
    +          Range (min … max):   829.6 ms … 845.9 ms    10 runs
    +
    +        Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
    +          Time (mean ± σ):     759.8 ms ±   5.1 ms    [User: 574.9 ms, System: 178.9 ms]
    +          Range (min … max):   753.1 ms … 768.8 ms    10 runs
    +
    +        Summary
    +          migrate reftable:files (refcount = 1000000, revision = HEAD) ran
    +            1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)
    +
    +    And vice versa:
    +
    +        Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
    +          Time (mean ± σ):     870.7 ms ±   5.7 ms    [User: 735.2 ms, System: 127.4 ms]
    +          Range (min … max):   861.6 ms … 883.2 ms    10 runs
    +
    +        Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
    +          Time (mean ± σ):     799.1 ms ±   8.5 ms    [User: 661.1 ms, System: 130.2 ms]
    +          Range (min … max):   787.5 ms … 812.6 ms    10 runs
    +
    +        Summary
    +          migrate files:reftable (refcount = 1000000, revision = HEAD) ran
    +            1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
    +
    +    The impact here is significantly smaller given that we don't perform any
    +    reference reads with "initial" semantics, so the speedup only comes from
    +    us doing less string list lookups.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 7:  a1da57bf37d !  9:  b6524ecfe0c refs/iterator: separate lifecycle from iteration
    @@ Commit message
         to always call a newly introduce `ref_iterator_free()` function that
         deallocates the iterator and its internal state.
     
    +    Note that the `dir_iterator` is somewhat special because it does not
    +    implement the `ref_iterator` interface, but is only used to implement
    +    other iterators. Consequently, we have to provide `dir_iterator_free()`
    +    instead of `dir_iterator_release()` as the allocated structure itself is
    +    managed by the `dir_iterator` interfaces, as well, and not freed by
    +    `ref_iterator_free()` like in all the other cases.
    +
         While at it, drop the return value of `ref_iterator_abort()`, which
         wasn't really required by any of the iterator implementations anyway.
         Furthermore, stop calling `base_ref_iterator_free()` in any of the
    @@ dir-iterator.c: struct dir_iterator *dir_iterator_begin(const char *path, unsign
     
      ## dir-iterator.h ##
     @@
    -  *             goto error_handler;
       *
       *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
    -- *             if (want_to_stop_iteration()) {
    +  *             if (want_to_stop_iteration()) {
     - *                     ok = dir_iterator_abort(iter);
    -+ *             if (want_to_stop_iteration())
    ++ *                     ok = ITER_DONE;
       *                     break;
    -- *             }
    +  *             }
       *
    -  *             // Access information about the current path:
    -  *             if (S_ISDIR(iter->st.st_mode))
     @@
       *
       *     if (ok != ITER_DONE)
       *             handle_error();
    -+ *     dir_iterator_release(iter);
    ++ *     dir_iterator_free(iter);
       *
       * Callers are allowed to modify iter->path while they are working,
       * but they must restore it to its original contents before calling
    @@ dir-iterator.h: struct dir_iterator *dir_iterator_begin(const char *path, unsign
      
      #endif
     
    + ## iterator.h ##
    +@@
    + #define ITER_OK 0
    + 
    + /*
    +- * The iterator is exhausted and has been freed.
    ++ * The iterator is exhausted.
    +  */
    + #define ITER_DONE -1
    + 
    +
      ## refs.c ##
     @@ refs.c: int refs_verify_refnames_available(struct ref_store *refs,
      {
    @@ refs/refs-internal.h: enum do_for_each_ref_flags {
       * to the next entry, ref_iterator_advance() aborts the iteration,
       * frees the ref_iterator, and returns ITER_ERROR.
     @@ refs/refs-internal.h: enum do_for_each_ref_flags {
    -  *     struct ref_iterator *iter = ...;
       *
       *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
    -- *             if (want_to_stop_iteration()) {
    +  *             if (want_to_stop_iteration()) {
     - *                     ok = ref_iterator_abort(iter);
    -+ *             if (want_to_stop_iteration())
    ++ *                     ok = ITER_DONE;
       *                     break;
    -- *             }
    +  *             }
       *
    -  *             // Access information about the current reference:
    -  *             if (!(iter->flags & REF_ISSYMREF))
     @@ refs/refs-internal.h: enum do_for_each_ref_flags {
       *
       *     if (ok != ITER_DONE)
 8:  4618e6d5959 = 10:  d2763859826 refs/iterator: provide infrastructure to re-seek iterators
 9:  527caf0bae2 = 11:  3a56f55e2c4 refs/iterator: implement seeking for merged iterators
10:  61eddf82887 = 12:  33a04cc0a80 refs/iterator: implement seeking for reftable iterators
11:  d6a9792cb4c = 13:  5380a6f57dc refs/iterator: implement seeking for ref-cache iterators
12:  72ac2f31c39 = 14:  7a26532dd1f refs/iterator: implement seeking for `packed-ref` iterators
13:  916ec77de21 = 15:  dad520cf933 refs/iterator: implement seeking for "files" iterators
14:  7d40945d157 ! 16:  c300e7f049e refs: reuse iterators when determining refname availability
    @@ Commit message
         single reference we're about to check. This keeps us from reusing state
         that the iterator may have and that may make it work more efficiently.
     
    -    Refactor the logic to reseek iterators. This leads to a speedup with the
    -    reftable backend, which is the only backend that knows to batch refname
    -    availability checks:
    +    Refactor the logic to reseek iterators. This leads to a sizeable speedup
    +    with the "reftable" backend:
     
             Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
               Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
    @@ Commit message
               update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
                 1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
     
    +    The "files" backend doesn't really show a huge impact:
    +
    +        Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
    +          Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
    +          Range (min … max):   384.6 ms … 404.5 ms    10 runs
    +
    +        Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
    +          Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
    +          Range (min … max):   377.0 ms … 397.7 ms    10 runs
    +
    +        Summary
    +          update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
    +            1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
    +
    +    This is mostly because it is way slower to begin with because it has to
    +    create a separate file for each new reference, so the milliseconds we
    +    shave off by reseeking the iterator doesn't really translate into a
    +    significant relative improvement.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## refs.c ##

---
base-commit: e2067b49ecaef9b7f51a17ce251f9207f72ef52d
change-id: 20250217-pks-update-ref-optimization-15c795e66e2b



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()`
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 17:02     ` Justin Tobler
  2025-02-19 13:23   ` [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.

This function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 14 ++++++++------
 object-name.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 945d5bdef25..bc0265ad2a1 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
 	strbuf_release(&ctx->symlink_path);
 }
 
-/*
- * This is like "get_oid_basic()", except it allows "object ID expressions",
- * notably "xyz^" for "parent of xyz"
- */
-int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+int repo_get_oid_with_flags(struct repository *r, const char *name, struct object_id *oid,
+			    unsigned flags)
 {
 	struct object_context unused;
-	int ret = get_oid_with_context(r, name, 0, oid, &unused);
+	int ret = get_oid_with_context(r, name, flags, oid, &unused);
 	object_context_release(&unused);
 	return ret;
 }
 
+int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+{
+	return repo_get_oid_with_flags(r, name, oid, 0);
+}
+
 /*
  * This returns a non-zero value if the string (built using printf
  * format and the given arguments) is not a valid object.
diff --git a/object-name.h b/object-name.h
index 8dba4a47a47..fb5a97b2c8e 100644
--- a/object-name.h
+++ b/object-name.h
@@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
 void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 			      int abbrev_len);
 
+/*
+ * This is like "get_oid_basic()", except it allows "object ID expressions",
+ * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
+ */
+int repo_get_oid_with_flags(struct repository *r, const char *str, struct object_id *oid,
+			    unsigned flags);
 int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
 __attribute__((format (printf, 2, 3)))
 int get_oidf(struct object_id *oid, const char *fmt, ...);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-21  8:00     ` Jeff King
  2025-02-19 13:23   ` [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
                     ` (14 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.

Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 hash.h        | 1 +
 object-name.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hash.h b/hash.h
index 4367acfec50..79419016513 100644
--- a/hash.h
+++ b/hash.h
@@ -204,6 +204,7 @@ struct object_id {
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
 #define GET_OID_HASH_ANY      020000
+#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index bc0265ad2a1..3e0b7edea11 100644
--- a/object-name.c
+++ b/object-name.c
@@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	int fatal = !(flags & GET_OID_QUIETLY);
 
 	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
-		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
+		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
+		    repo_settings_get_warn_ambiguous_refs(r) &&
+		    warn_on_object_refname_ambiguity) {
 			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 			if (refs_found > 0) {
 				warning(warn_msg, len, str);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 18:21     ` Justin Tobler
  2025-02-19 13:23   ` [PATCH v2 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
                     ` (13 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve references, despite the fact that its manpage does
not mention this fact even once.

One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is out of two reasons:

  - The manpage is explicitly structured around object IDs. So if we see
    a fully blown object ID, the intent should be quite clear in
    general.

  - The command is part of our plumbing layer and not a tool that users
    would generally use in interactive workflows. As such, the warning
    will likely not be visible to anybody in the first place.

Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.

The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).

Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The following benchmark creates 10000 new
references with a 100000 preexisting refs with the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
      Range (min … max):   461.9 ms … 479.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
      Range (min … max):   384.9 ms … 405.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

And with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
      Range (min … max):   142.7 ms … 150.8 ms    19 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
      Range (min … max):    61.1 ms …  66.6 ms    41 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/update-ref.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4d35bdc4b4b..d603f54b770 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
 		(*next)++;
 		*next = parse_arg(*next, &arg);
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else {
 			/* Without -z, an empty value means all zeros: */
@@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
 		*next += arg.len;
 
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
 			/* With -z, treat an empty value as all zeros: */
@@ -772,7 +774,8 @@ int cmd_update_ref(int argc,
 		refname = argv[0];
 		value = argv[1];
 		oldval = argv[2];
-		if (repo_get_oid(the_repository, value, &oid))
+		if (repo_get_oid_with_flags(the_repository, value, &oid,
+					    GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid SHA1", value);
 	}
 
@@ -783,7 +786,8 @@ int cmd_update_ref(int argc,
 			 * must not already exist:
 			 */
 			oidclr(&oldoid, the_repository->hash_algo);
-		else if (repo_get_oid(the_repository, oldval, &oldoid))
+		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
+						 GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid old SHA1", oldval);
 	}
 

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 04/16] refs: introduce function to batch refname availability checks
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 05/16] refs/reftable: " Patrick Steinhardt
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.

Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.

The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 169 +++++++++++++++++++++++++++++++++++++----------------------------
 refs.h |  12 +++++
 2 files changed, 109 insertions(+), 72 deletions(-)

diff --git a/refs.c b/refs.c
index f4094a326a9..5a9b0f2fa1e 100644
--- a/refs.c
+++ b/refs.c
@@ -2467,19 +2467,15 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 	return ret;
 }
 
-int refs_verify_refname_available(struct ref_store *refs,
-				  const char *refname,
-				  const struct string_list *extras,
-				  const struct string_list *skip,
-				  unsigned int initial_transaction,
-				  struct strbuf *err)
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err)
 {
-	const char *slash;
-	const char *extra_refname;
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct object_id oid;
-	unsigned int type;
 	int ret = -1;
 
 	/*
@@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
 
 	assert(err);
 
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/*
-		 * Just saying "Is a directory" when we e.g. can't
-		 * lock some multi-level ref isn't very informative,
-		 * the user won't be told *what* is a directory, so
-		 * let's not use strerror() below.
-		 */
-		int ignore_errno;
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+	for (size_t i = 0; i < refnames->nr; i++) {
+		const char *refname = refnames->items[i].string;
+		const char *extra_refname;
+		struct object_id oid;
+		unsigned int type;
+		const char *slash;
+
+		strbuf_reset(&dirname);
+
+		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+			/*
+			 * Just saying "Is a directory" when we e.g. can't
+			 * lock some multi-level ref isn't very informative,
+			 * the user won't be told *what* is a directory, so
+			 * let's not use strerror() below.
+			 */
+			int ignore_errno;
+
+			/* Expand dirname to the new prefix, not including the trailing slash: */
+			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+			/*
+			 * We are still at a leading dir of the refname (e.g.,
+			 * "refs/foo"; if there is a reference with that name,
+			 * it is a conflict, *unless* it is in skip.
+			 */
+			if (skip && string_list_has_string(skip, dirname.buf))
+				continue;
+
+			if (!initial_transaction &&
+			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+					       &type, &ignore_errno)) {
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    dirname.buf, refname);
+				goto cleanup;
+			}
+
+			if (extras && string_list_has_string(extras, dirname.buf)) {
+				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
+					    refname, dirname.buf);
+				goto cleanup;
+			}
+		}
 
 		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
+		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+		 * There is no point in searching for a reference with that
+		 * name, because a refname isn't considered to conflict with
+		 * itself. But we still need to check for references whose
+		 * names are in the "refs/foo/bar/" namespace, because they
+		 * *do* conflict.
 		 */
-		if (skip && string_list_has_string(skip, dirname.buf))
-			continue;
+		strbuf_addstr(&dirname, refname + dirname.len);
+		strbuf_addch(&dirname, '/');
+
+		if (!initial_transaction) {
+			struct ref_iterator *iter;
+			int ok;
+
+			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+						       DO_FOR_EACH_INCLUDE_BROKEN);
+			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+				if (skip &&
+				    string_list_has_string(skip, iter->refname))
+					continue;
+
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    iter->refname, refname);
+				ref_iterator_abort(iter);
+				goto cleanup;
+			}
 
-		if (!initial_transaction &&
-		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
-				       &type, &ignore_errno)) {
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    dirname.buf, refname);
-			goto cleanup;
+			if (ok != ITER_DONE)
+				BUG("error while iterating over references");
 		}
 
-		if (extras && string_list_has_string(extras, dirname.buf)) {
+		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+		if (extra_refname) {
 			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-				    refname, dirname.buf);
+				    refname, extra_refname);
 			goto cleanup;
 		}
 	}
 
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (!initial_transaction) {
-		struct ref_iterator *iter;
-		int ok;
-
-		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-					       DO_FOR_EACH_INCLUDE_BROKEN);
-		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-			if (skip &&
-			    string_list_has_string(skip, iter->refname))
-				continue;
-
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    iter->refname, refname);
-			ref_iterator_abort(iter);
-			goto cleanup;
-		}
-
-		if (ok != ITER_DONE)
-			BUG("error while iterating over references");
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-			    refname, extra_refname);
-	else
-		ret = 0;
+	ret = 0;
 
 cleanup:
 	strbuf_release(&referent);
@@ -2569,6 +2577,23 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
+int refs_verify_refname_available(struct ref_store *refs,
+				  const char *refname,
+				  const struct string_list *extras,
+				  const struct string_list *skip,
+				  unsigned int initial_transaction,
+				  struct strbuf *err)
+{
+	struct string_list_item item = { .string = (char *) refname };
+	struct string_list refnames = {
+		.items = &item,
+		.nr = 1,
+	};
+
+	return refs_verify_refnames_available(refs, &refnames, extras, skip,
+					      initial_transaction, err);
+}
+
 struct do_for_each_reflog_help {
 	each_reflog_fn *fn;
 	void *cb_data;
diff --git a/refs.h b/refs.h
index a0cdd99250e..185aed5a461 100644
--- a/refs.h
+++ b/refs.h
@@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  unsigned int initial_transaction,
 				  struct strbuf *err);
 
+/*
+ * Same as `refs_verify_refname_available()`, but checking for a list of
+ * refnames instead of only a single item. This is more efficient in the case
+ * where one needs to check multiple refnames.
+ */
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err);
+
 int refs_ref_exists(struct ref_store *refs, const char *refname);
 
 int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 05/16] refs/reftable: batch refname availability checks
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as we
essentially still call `refs_verify_refname_available()` in a loop, but
this will change in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index d39a14c5a46..2a90e7cb391 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
 	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct reftable_transaction_data *tx_data = NULL;
 	struct reftable_backend *be;
 	struct object_id head_oid;
@@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * can output a proper error message instead of failing
 			 * at a later point.
 			 */
-			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL,
-							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
-							    err);
-			if (ret < 0)
-				goto done;
+			string_list_append(&refnames_to_check, u->refname);
 
 			/*
 			 * There is no need to write the reference deletion
@@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	string_list_sort(&refnames_to_check);
+	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
+					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+					     err);
+	if (ret < 0)
+		goto done;
+
 	transaction->backend_data = tx_data;
 	transaction->state = REF_TRANSACTION_PREPARED;
 
@@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 	string_list_clear(&affected_refnames, 0);
 	strbuf_release(&referent);
 	strbuf_release(&head_referent);
+	string_list_clear(&refnames_to_check, 0);
 
 	return ret;
 }

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 06/16] refs/files: batch refname availability checks for normal transactions
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:

  1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
     a new reference, mostly to create a nice, user-readable error
     message. This is nothing we have to care about too much, as we only
     hit this code path at most once when we hit a conflict.

  2. `lock_raw_ref()` when it _could_ create the lockfile to check
     whether it is conflicting with any packed refs. In the general case,
     this code path will be hit once for every (successful) reference
     update.

  3. `lock_ref_oid_basic()`, but it is only executed when copying or
     renaming references or when expiring reflogs. It will thus not be
     called in contexts where we have many references queued up.

  4. `refs_refname_ref_available()`, but again only when copying or
     renaming references. It is thus not interesting due to the same
     reason as the previous case.

  5. `files_transaction_finish_initial()`, which is only executed when
     creating a new repository or migrating references.

So out of these, only (2) and (5) are viable candidates to use the
batched checks.

Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.

The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..6ce79cf0791 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -678,6 +678,7 @@ static void unlock_ref(struct ref_lock *lock)
  */
 static int lock_raw_ref(struct files_ref_store *refs,
 			const char *refname, int mustexist,
+			struct string_list *refnames_to_check,
 			const struct string_list *extras,
 			struct ref_lock **lock_p,
 			struct strbuf *referent,
@@ -855,16 +856,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 		}
 
 		/*
-		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing packed ref that
-		 * conflicts with refname:
+		 * If the ref did not exist and we are creating it, we have to
+		 * make sure there is no existing packed ref that conflicts
+		 * with refname. This check is deferred so that we can batch it.
 		 */
-		if (refs_verify_refname_available(
-				    refs->packed_ref_store, refname,
-				    extras, NULL, 0, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto error_return;
-		}
+		string_list_insert(refnames_to_check, refname);
 	}
 
 	ret = 0;
@@ -2569,6 +2565,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			       struct ref_update *update,
 			       struct ref_transaction *transaction,
 			       const char *head_ref,
+			       struct string_list *refnames_to_check,
 			       struct string_list *affected_refnames,
 			       struct strbuf *err)
 {
@@ -2597,7 +2594,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 		lock->count++;
 	} else {
 		ret = lock_raw_ref(refs, update->refname, mustexist,
-				   affected_refnames,
+				   refnames_to_check, affected_refnames,
 				   &lock, &referent,
 				   &update->type, err);
 		if (ret) {
@@ -2811,6 +2808,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	char *head_ref = NULL;
 	int head_type;
 	struct files_transaction_backend_data *backend_data;
@@ -2898,7 +2896,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		struct ref_update *update = transaction->updates[i];
 
 		ret = lock_ref_for_update(refs, update, transaction,
-					  head_ref, &affected_refnames, err);
+					  head_ref, &refnames_to_check,
+					  &affected_refnames, err);
 		if (ret)
 			goto cleanup;
 
@@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	/*
+	 * Verify that none of the loose reference that we're about to write
+	 * conflict with any existing packed references. Ideally, we'd do this
+	 * check after the packed-refs are locked so that the file cannot
+	 * change underneath our feet. But introducing such a lock now would
+	 * probably do more harm than good as users rely on there not being a
+	 * global lock with the "files" backend.
+	 *
+	 * Another alternative would be to do the check after the (optional)
+	 * lock, but that would extend the time we spend in the globally-locked
+	 * state.
+	 *
+	 * So instead, we accept the race for now.
+	 */
+	if (refs_verify_refnames_available(refs->packed_ref_store, &refnames_to_check,
+					   &affected_refnames, NULL, 0, err)) {
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
 	if (packed_transaction) {
 		if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
@@ -2972,6 +2991,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 cleanup:
 	free(head_ref);
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 
 	if (ret)
 		files_transaction_cleanup(refs, transaction);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 07/16] refs/files: batch refname availability checks for initial transactions
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.

Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.

This doesn't yet have an effect on performance as the underlying
logic simply calls This will improve performance when cloning a repository with
many references or when migrating references from any format to the
"files" format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6ce79cf0791..11a620ea11a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct ref_transaction *packed_transaction = NULL;
 	struct ref_transaction *loose_transaction = NULL;
 
@@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		    !is_null_oid(&update->old_oid))
 			BUG("initial ref transaction with old_sha1 set");
 
-		if (refs_verify_refname_available(&refs->base, update->refname,
-						  &affected_refnames, NULL, 1, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto cleanup;
-		}
+		string_list_append(&refnames_to_check, update->refname);
 
 		/*
 		 * packed-refs don't support symbolic refs, root refs and reflogs,
@@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 	}
 
-	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
-	    ref_transaction_commit(packed_transaction, err)) {
+	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
+					   &affected_refnames, NULL, 1, err)) {
+		packed_refs_unlock(refs->packed_ref_store);
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
+	if (ref_transaction_commit(packed_transaction, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
 		goto cleanup;
 	}
@@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		ref_transaction_free(packed_transaction);
 	transaction->state = REF_TRANSACTION_CLOSED;
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 	return ret;
 }
 

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 08/16] refs: stop re-verifying common prefixes for availability
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.

When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.

Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
      Range (min … max):    60.6 ms …  69.5 ms    38 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
      Range (min … max):    38.1 ms …  47.3 ms    61 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

For the "files" backend we see an improvement, but a much smaller one:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     395.8 ms ±   5.3 ms    [User: 63.6 ms, System: 330.5 ms]
      Range (min … max):   387.0 ms … 404.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     386.0 ms ±   4.0 ms    [User: 51.5 ms, System: 332.8 ms]
      Range (min … max):   380.8 ms … 392.6 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     836.6 ms ±   5.6 ms    [User: 645.2 ms, System: 185.2 ms]
      Range (min … max):   829.6 ms … 845.9 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     759.8 ms ±   5.1 ms    [User: 574.9 ms, System: 178.9 ms]
      Range (min … max):   753.1 ms … 768.8 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = HEAD) ran
        1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

And vice versa:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     870.7 ms ±   5.7 ms    [User: 735.2 ms, System: 127.4 ms]
      Range (min … max):   861.6 ms … 883.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     799.1 ms ±   8.5 ms    [User: 661.1 ms, System: 130.2 ms]
      Range (min … max):   787.5 ms … 812.6 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/refs.c b/refs.c
index 5a9b0f2fa1e..eaf41421f50 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct strset dirnames;
 	int ret = -1;
 
 	/*
@@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 	assert(err);
 
+	strset_init(&dirnames);
+
 	for (size_t i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 		const char *extra_refname;
@@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 			if (skip && string_list_has_string(skip, dirname.buf))
 				continue;
 
+			/*
+			 * If we've already seen the directory we don't need to
+			 * process it again. Skip it to avoid checking checking
+			 * common prefixes like "refs/heads/" repeatedly.
+			 */
+			if (!strset_add(&dirnames, dirname.buf))
+				continue;
+
 			if (!initial_transaction &&
 			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 					       &type, &ignore_errno)) {
@@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 cleanup:
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
+	strset_clear(&dirnames);
 	return ret;
 }
 

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 09/16] refs/iterator: separate lifecycle from iteration
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |  2 +
 dir-iterator.c               | 24 +++++------
 dir-iterator.h               | 11 ++---
 iterator.h                   |  2 +-
 refs.c                       |  7 +++-
 refs/debug.c                 |  9 ++---
 refs/files-backend.c         | 36 +++++------------
 refs/iterator.c              | 95 ++++++++++++++------------------------------
 refs/packed-backend.c        | 27 ++++++-------
 refs/ref-cache.c             |  9 ++---
 refs/refs-internal.h         | 29 +++++---------
 refs/reftable-backend.c      | 34 ++++------------
 t/helper/test-dir-iterator.c |  1 +
 13 files changed, 100 insertions(+), 186 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..ccd6a197343 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -28,7 +28,7 @@
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -39,6 +39,7 @@
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_free(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +108,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/iterator.h b/iterator.h
index 0f6900e43ad..6b77dcc2626 100644
--- a/iterator.h
+++ b/iterator.h
@@ -12,7 +12,7 @@
 #define ITER_OK 0
 
 /*
- * The iterator is exhausted and has been freed.
+ * The iterator is exhausted.
  */
 #define ITER_DONE -1
 
diff --git a/refs.c b/refs.c
index eaf41421f50..8eff60a2186 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 11a620ea11a..859f1c11941 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -915,10 +915,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -931,23 +927,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1378,7 +1368,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1386,6 +1376,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1452,6 +1443,7 @@ static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2299,9 +2291,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2311,23 +2300,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..aaeff270437 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +365,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +377,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +424,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..74e2c03cef1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -293,7 +293,7 @@ enum do_for_each_ref_flags {
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -307,6 +307,7 @@ enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +334,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +435,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +453,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2a90e7cb391..06543f79c64 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2017,17 +2008,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2038,21 +2022,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-24 13:08     ` shejialuo
  2025-02-19 13:23   ` [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
                     ` (6 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.

Introduce a new `.seek` function in the ref iterator vtable that allows
callers to re-seek an iterator. It is expected to be functionally the
same as calling `refs_ref_iterator_begin()` with a different (or the
same) prefix.

Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/debug.c         | 11 +++++++++++
 refs/iterator.c      | 24 ++++++++++++++++++++++++
 refs/refs-internal.h | 23 +++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/refs/debug.c b/refs/debug.c
index a9786da4ba1..5390fa9c187 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -169,6 +169,16 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return res;
 }
 
+static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->seek(diter->iter, prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	return res;
+}
+
 static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -189,6 +199,7 @@ static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
+	.seek = debug_ref_iterator_seek,
 	.peel = debug_ref_iterator_peel,
 	.release = debug_ref_iterator_release,
 };
diff --git a/refs/iterator.c b/refs/iterator.c
index aaeff270437..757b105261a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,6 +15,12 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix)
+{
+	return ref_iterator->vtable->seek(ref_iterator, prefix);
+}
+
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled)
 {
@@ -50,6 +56,12 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 	return ITER_DONE;
 }
 
+static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				   const char *prefix UNUSED)
+{
+	return 0;
+}
+
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				   struct object_id *peeled UNUSED)
 {
@@ -62,6 +74,7 @@ static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
+	.seek = empty_ref_iterator_seek,
 	.peel = empty_ref_iterator_peel,
 	.release = empty_ref_iterator_release,
 };
@@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				    struct object_id *peeled)
 {
@@ -387,6 +410,7 @@ static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
+	.seek = prefix_ref_iterator_seek,
 	.peel = prefix_ref_iterator_peel,
 	.release = prefix_ref_iterator_release,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 74e2c03cef1..3f6d43110b7 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -327,6 +327,21 @@ struct ref_iterator {
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
 /*
  * If possible, peel the reference currently being viewed by the
  * iterator. Return 0 on success.
@@ -445,6 +460,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference matching the given prefix. Should
+ * behave the same as if a new iterator was created with the same prefix.
+ */
+typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
+				 const char *prefix);
+
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
  */
@@ -459,6 +481,7 @@ typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
+	ref_iterator_seek_fn *seek;
 	ref_iterator_peel_fn *peel;
 	ref_iterator_release_fn *release;
 };

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-24 13:37     ` shejialuo
  2025-02-19 13:23   ` [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index 757b105261a..63608ef9907 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
 struct merge_ref_iterator {
 	struct ref_iterator base;
 
-	struct ref_iterator *iter0, *iter1;
+	struct ref_iterator *iter0, *iter0_owned;
+	struct ref_iterator *iter1, *iter1_owned;
 
 	ref_iterator_select_fn *select;
 	void *cb_data;
@@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
-			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
-			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -177,7 +176,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
-			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -206,7 +204,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
-				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -226,6 +223,28 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_ERROR;
 }
 
+static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ret;
+
+	iter->current = NULL;
+	iter->iter0 = iter->iter0_owned;
+	iter->iter1 = iter->iter1_owned;
+
+	ret = ref_iterator_seek(iter->iter0, prefix);
+	if (ret < 0)
+		return ret;
+
+	ret = ref_iterator_seek(iter->iter1, prefix);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	ref_iterator_free(iter->iter0);
-	ref_iterator_free(iter->iter1);
+	ref_iterator_free(iter->iter0_owned);
+	ref_iterator_free(iter->iter1_owned);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
+	.seek = merge_ref_iterator_seek,
 	.peel = merge_ref_iterator_peel,
 	.release = merge_ref_iterator_release,
 };
@@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 */
 
 	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
-	iter->iter0 = iter0;
-	iter->iter1 = iter1;
+	iter->iter0 = iter->iter0_owned = iter0;
+	iter->iter1 = iter->iter1_owned = iter1;
 	iter->select = select;
 	iter->cb_data = cb_data;
 	iter->current = NULL;

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-24 14:00     ` shejialuo
  2025-02-19 13:23   ` [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:

  - We do not support seeking on reflog iterators.

  - We start to check whether `reftable_stack_init_ref_iterator()` is
    successful.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 06543f79c64..b0c09f34433 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -547,7 +547,7 @@ struct reftable_ref_iterator {
 	struct reftable_ref_record ref;
 	struct object_id oid;
 
-	const char *prefix;
+	char *prefix;
 	size_t prefix_len;
 	char **exclude_patterns;
 	size_t exclude_patterns_index;
@@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				      const char *prefix)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->prefix_len = prefix ? strlen(prefix) : 0;
+	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+
+	return iter->err;
+}
+
 static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				      struct object_id *peeled)
 {
@@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
+	free(iter->prefix);
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
+	.seek = reftable_ref_iterator_seek,
 	.peel = reftable_ref_iterator_peel,
 	.release = reftable_ref_iterator_release,
 };
@@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
-	iter->prefix = prefix;
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
 	iter->refs = refs;
@@ -821,8 +835,11 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	reftable_stack_init_ref_iterator(stack, &iter->iter);
-	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
+	ret = reftable_stack_init_ref_iterator(stack, &iter->iter);
+	if (ret)
+		goto done;
+
+	ret = reftable_ref_iterator_seek(&iter->base, prefix);
 	if (ret)
 		goto done;
 
@@ -2015,6 +2032,13 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+					 const char *prefix UNUSED)
+{
+	BUG("reftable reflog iterator cannot be seeked");
+	return -1;
+}
+
 static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 					 struct object_id *peeled UNUSED)
 {
@@ -2033,6 +2057,7 @@ static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
+	.seek = reftable_reflog_iterator_seek,
 	.peel = reftable_reflog_iterator_peel,
 	.release = reftable_reflog_iterator_release,
 };

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-24 14:49     ` shejialuo
  2025-02-19 13:23   ` [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
                     ` (3 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.

Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance corncern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/ref-cache.c | 74 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 26 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 6457e02c1ea..b54547d71ee 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -362,9 +362,7 @@ struct cache_ref_iterator {
 	struct ref_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack.
 	 */
 	size_t levels_nr;
 
@@ -389,6 +387,9 @@ struct cache_ref_iterator {
 	struct cache_ref_iterator_level *levels;
 
 	struct repository *repo;
+	struct ref_cache *cache;
+
+	int prime_dir;
 };
 
 static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
@@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
 
+	if (!iter->levels_nr)
+		return ITER_DONE;
+
 	while (1) {
 		struct cache_ref_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -444,6 +448,40 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct ref_dir *dir;
+
+	dir = get_ref_dir(iter->cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix);
+
+	if (dir) {
+		struct cache_ref_iterator_level *level;
+
+		if (iter->prime_dir)
+			prime_ref_dir(dir, prefix);
+		iter->levels_nr = 1;
+		level = &iter->levels[0];
+		level->index = -1;
+		level->dir = dir;
+
+		if (prefix && *prefix) {
+			iter->prefix = xstrdup(prefix);
+			level->prefix_state = PREFIX_WITHIN_DIR;
+		} else {
+			level->prefix_state = PREFIX_CONTAINS_DIR;
+		}
+	} else {
+		iter->levels_nr = 0;
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -462,6 +500,7 @@ static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
+	.seek = cache_ref_iterator_seek,
 	.peel = cache_ref_iterator_peel,
 	.release = cache_ref_iterator_release,
 };
@@ -471,39 +510,22 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      struct repository *repo,
 					      int prime_dir)
 {
-	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	dir = get_ref_dir(cache->root);
-	if (prefix && *prefix)
-		dir = find_containing_dir(dir, prefix);
-	if (!dir)
-		/* There's nothing to iterate over. */
-		return empty_ref_iterator_begin();
-
-	if (prime_dir)
-		prime_ref_dir(dir, prefix);
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
+	iter->repo = repo;
+	iter->cache = cache;
+	iter->prime_dir = prime_dir;
 
-	if (prefix && *prefix) {
-		iter->prefix = xstrdup(prefix);
-		level->prefix_state = PREFIX_WITHIN_DIR;
-	} else {
-		level->prefix_state = PREFIX_CONTAINS_DIR;
+	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
 	}
 
-	iter->repo = repo;
-
 	return ref_iterator;
 }

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (12 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-24 15:09     ` shejialuo
  2025-02-19 13:23   ` [PATCH v2 15/16] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
                     ` (2 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 62 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 38a1956d1a8..71a38acfedc 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -819,6 +819,8 @@ struct packed_ref_iterator {
 
 	struct snapshot *snapshot;
 
+	char *prefix;
+
 	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
@@ -841,11 +843,9 @@ struct packed_ref_iterator {
 };
 
 /*
- * Move the iterator to the next record in the snapshot, without
- * respect for whether the record is actually required by the current
- * iteration. Adjust the fields in `iter` and return `ITER_OK` or
- * `ITER_DONE`. This function does not free the iterator in the case
- * of `ITER_DONE`.
+ * Move the iterator to the next record in the snapshot. Adjust the fields in
+ * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
+ * iterator in the case of `ITER_DONE`.
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
@@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = next_record(iter)) == ITER_OK) {
+		const char *refname = iter->base.refname;
+		const char *prefix = iter->prefix;
+
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
 		    !is_per_worktree_ref(iter->base.refname))
 			continue;
@@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 					    &iter->oid, iter->flags))
 			continue;
 
+		while (prefix && *prefix) {
+			if (*refname < *prefix)
+				BUG("packed-refs backend yielded reference preceding its prefix");
+			else if (*refname > *prefix)
+				return ITER_DONE;
+			prefix++;
+			refname++;
+		}
+
 		return ITER_OK;
 	}
 
 	return ok;
 }
 
+static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct packed_ref_iterator *iter =
+		(struct packed_ref_iterator *)ref_iterator;
+	const char *start;
+
+	if (prefix && *prefix)
+		start = find_reference_location(iter->snapshot, prefix, 0);
+	else
+		start = iter->snapshot->start;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->pos = start;
+	iter->eof = iter->snapshot->eof;
+
+	return 0;
+}
+
 static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
+	free(iter->prefix);
 	release_snapshot(iter->snapshot);
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
+	.seek = packed_ref_iterator_seek,
 	.peel = packed_ref_iterator_peel,
 	.release = packed_ref_iterator_release,
 };
@@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
-	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
-	if (start == snapshot->eof)
-		return empty_ref_iterator_begin();
-
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
@@ -1130,19 +1155,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
-
-	iter->pos = start;
-	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-
 	iter->base.oid = &iter->oid;
-
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (prefix && *prefix)
-		/* Stop iteration after we've gone *past* prefix: */
-		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+	packed_ref_iterator_seek(&iter->base, prefix);
 
 	return ref_iterator;
 }

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 15/16] refs/iterator: implement seeking for "files" iterators
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (13 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-19 13:23   ` [PATCH v2 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
  2025-02-24 15:18   ` [PATCH v2 00/16] refs: batch refname availability checks shejialuo
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 859f1c11941..4e1c50fead3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -918,6 +918,14 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -936,6 +944,7 @@ static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
+	.seek = files_ref_iterator_seek,
 	.peel = files_ref_iterator_peel,
 	.release = files_ref_iterator_release,
 };
@@ -2294,6 +2303,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				      const char *prefix UNUSED)
+{
+	BUG("ref_iterator_seek() called for reflog_iterator");
+}
+
 static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				      struct object_id *peeled UNUSED)
 {
@@ -2309,6 +2324,7 @@ static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
+	.seek = files_reflog_iterator_seek,
 	.peel = files_reflog_iterator_peel,
 	.release = files_reflog_iterator_release,
 };

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v2 16/16] refs: reuse iterators when determining refname availability
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (14 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 15/16] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
@ 2025-02-19 13:23   ` Patrick Steinhardt
  2025-02-24 15:14     ` shejialuo
  2025-02-24 15:18   ` [PATCH v2 00/16] refs: batch refname availability checks shejialuo
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-19 13:23 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

The "files" backend doesn't really show a huge impact:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
      Range (min … max):   384.6 ms … 404.5 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
      Range (min … max):   377.0 ms … 397.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 8eff60a2186..6cbb9decdb0 100644
--- a/refs.c
+++ b/refs.c
@@ -2555,8 +2555,13 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		if (!initial_transaction) {
 			int ok;
 
-			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-						       DO_FOR_EACH_INCLUDE_BROKEN);
+			if (!iter) {
+				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+							       DO_FOR_EACH_INCLUDE_BROKEN);
+			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+				goto cleanup;
+			}
+
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 				    string_list_has_string(skip, iter->refname))
@@ -2569,9 +2574,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
-
-			ref_iterator_free(iter);
-			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()`
  2025-02-19 13:23   ` [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-02-19 17:02     ` Justin Tobler
  0 siblings, 0 replies; 163+ messages in thread
From: Justin Tobler @ 2025-02-19 17:02 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

On 25/02/19 02:23PM, Patrick Steinhardt wrote:
[snip]
> diff --git a/object-name.c b/object-name.c
> index 945d5bdef25..bc0265ad2a1 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
>  	strbuf_release(&ctx->symlink_path);
>  }
>  
> -/*
> - * This is like "get_oid_basic()", except it allows "object ID expressions",
> - * notably "xyz^" for "parent of xyz"
> - */
> -int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
> +int repo_get_oid_with_flags(struct repository *r, const char *name, struct object_id *oid,
> +			    unsigned flags)

style: The function signature runs a bit long on the first line. Not a
big deal, but we could reformat it.

>  {
>  	struct object_context unused;
> -	int ret = get_oid_with_context(r, name, 0, oid, &unused);
> +	int ret = get_oid_with_context(r, name, flags, oid, &unused);
>  	object_context_release(&unused);
>  	return ret;
>  }
>  
> +int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
> +{
> +	return repo_get_oid_with_flags(r, name, oid, 0);
> +}
> +
>  /*
>   * This returns a non-zero value if the string (built using printf
>   * format and the given arguments) is not a valid object.
> diff --git a/object-name.h b/object-name.h
> index 8dba4a47a47..fb5a97b2c8e 100644
> --- a/object-name.h
> +++ b/object-name.h
> @@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
>  void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
>  			      int abbrev_len);
>  
> +/*
> + * This is like "get_oid_basic()", except it allows "object ID expressions",
> + * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
> + */
> +int repo_get_oid_with_flags(struct repository *r, const char *str, struct object_id *oid,
> +			    unsigned flags);

Same here.

>  int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
>  __attribute__((format (printf, 2, 3)))
>  int get_oidf(struct object_id *oid, const char *fmt, ...);
> 
> -- 
> 2.48.1.683.gf705b3209c.dirty
> 
> 


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-19 13:23   ` [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-19 18:21     ` Justin Tobler
  2025-02-20  8:05       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: Justin Tobler @ 2025-02-19 18:21 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

On 25/02/19 02:23PM, Patrick Steinhardt wrote:
> Most of the commands in git-update-ref(1) accept an old and/or new
> object ID to update a specific reference to. These object IDs get parsed
> via `repo_get_oid()`, which not only handles plain object IDs, but also
> those that have a suffix like "~" or "^2". More surprisingly though, it
> even knows to resolve references, despite the fact that its manpage does
> not mention this fact even once.
> 
> One consequence of this is that we also check for ambiguous references:
> when parsing a full object ID where the DWIM mechanism would also cause
> us to resolve it as a branch, we'd end up printing a warning. While this
> check makes sense to have in general, it is arguably less useful in the
> context of git-update-ref(1). This is out of two reasons:
> 
>   - The manpage is explicitly structured around object IDs. So if we see
>     a fully blown object ID, the intent should be quite clear in
>     general.

Makes sense.

>   - The command is part of our plumbing layer and not a tool that users
>     would generally use in interactive workflows. As such, the warning
>     will likely not be visible to anybody in the first place.

Ok, so in many cases already the warning is not propagated which makes
its computation wasteful to begin with.

> Furthermore, this check can be quite expensive when updating lots of
> references via `--stdin`, because we try to read multiple references per
> object ID that we parse according to the DWIM rules. This effect can be
> seen both with the "files" and "reftable" backend.
> 
> The issue is not unique to git-update-ref(1), but was also an issue in
> git-cat-file(1), where it was addressed by disabling the ambiguity check
> in 25fba78d36b (cat-file: disable object/refname ambiguity check for
> batch mode, 2013-07-12).
> 
> Disable the warning in git-update-ref(1), which provides a significant
> speedup with both backends. The following benchmark creates 10000 new
> references with a 100000 preexisting refs with the "files" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
>       Range (min … max):   461.9 ms … 479.3 ms    10 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
>       Range (min … max):   384.9 ms … 405.7 ms    10 runs
> 
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> And with the "reftable" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
>       Range (min … max):   142.7 ms … 150.8 ms    19 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
>       Range (min … max):    61.1 ms …  66.6 ms    41 runs
> 
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> Note that the absolute improvement with both backends is roughly in the
> same ballpark, but the relative improvement for the "reftable" backend
> is more significant because writing the new table to disk is faster in
> the first place.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/update-ref.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> index 4d35bdc4b4b..d603f54b770 100644
> --- a/builtin/update-ref.c
> +++ b/builtin/update-ref.c
> @@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
>  		(*next)++;
>  		*next = parse_arg(*next, &arg);
>  		if (arg.len) {
> -			if (repo_get_oid(the_repository, arg.buf, oid))
> +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
> +						    GET_OID_SKIP_AMBIGUITY_CHECK))
>  				goto invalid;
>  		} else {
>  			/* Without -z, an empty value means all zeros: */
> @@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
>  		*next += arg.len;
>  
>  		if (arg.len) {
> -			if (repo_get_oid(the_repository, arg.buf, oid))
> +			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
> +						    GET_OID_SKIP_AMBIGUITY_CHECK))
>  				goto invalid;
>  		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
>  			/* With -z, treat an empty value as all zeros: */
> @@ -772,7 +774,8 @@ int cmd_update_ref(int argc,
>  		refname = argv[0];
>  		value = argv[1];
>  		oldval = argv[2];
> -		if (repo_get_oid(the_repository, value, &oid))
> +		if (repo_get_oid_with_flags(the_repository, value, &oid,
> +					    GET_OID_SKIP_AMBIGUITY_CHECK))
>  			die("%s: not a valid SHA1", value);
>  	}
>  
> @@ -783,7 +786,8 @@ int cmd_update_ref(int argc,
>  			 * must not already exist:
>  			 */
>  			oidclr(&oldoid, the_repository->hash_algo);
> -		else if (repo_get_oid(the_repository, oldval, &oldoid))
> +		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
> +						 GET_OID_SKIP_AMBIGUITY_CHECK))
>  			die("%s: not a valid old SHA1", oldval);
>  	}

In builtin/update-ref.c all uses of repo_get_oid() have been converted
to repo_get_oid_with_flags() with the GET_OID_SKIP_AMBIGUITY_CHECK flag
except for one in parse_cmd_symref_update(). Is there reason to leave
that one untouched?

-Justin


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 09/14] refs/iterator: implement seeking for merged iterators
  2025-02-17 15:50 ` [PATCH 09/14] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-02-19 20:10   ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-02-19 20:10 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Implement seeking on merged iterators. The implementation is rather
> straight forward, with the only exception that we must not deallocate
> the underlying iterators once they have been exhausted.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
>  1 file changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/refs/iterator.c b/refs/iterator.c
> index 757b105261a..63608ef9907 100644
> --- a/refs/iterator.c
> +++ b/refs/iterator.c
> @@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
>  struct merge_ref_iterator {
>  	struct ref_iterator base;
>
> -	struct ref_iterator *iter0, *iter1;
> +	struct ref_iterator *iter0, *iter0_owned;
> +	struct ref_iterator *iter1, *iter1_owned;
>
>  	ref_iterator_select_fn *select;
>  	void *cb_data;
> @@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	if (!iter->current) {
>  		/* Initialize: advance both iterators to their first entries */
>  		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
> -			ref_iterator_free(iter->iter0);
>  			iter->iter0 = NULL;

Okay so if advancing the iterator fails, we set the current iterator to
NULL but the underlying pointer `iter0_owned` still holds. Okay makes sense.

Now it's just a matter of ensuring that we reuse the original iterator
when needed and seems like that' what we do in this patch. Looks good!

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH 10/14] refs/iterator: implement seeking for reftable iterators
  2025-02-17 15:50 ` [PATCH 10/14] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-02-19 20:13   ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-02-19 20:13 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Implement seeking of reftable iterators. As the low-level reftable
> iterators already support seeking this change is straight-forward. Two
> notes though:
>
>   - We do not support seeking on reflog iterators.
>

Nit: this doesn't explain the reason and it would be nice to state why
we do not support reflog iterators here

>   - We start to check whether `reftable_stack_init_ref_iterator()` is
>     successful.
>

The patch looks good!

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-19 18:21     ` Justin Tobler
@ 2025-02-20  8:05       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-20  8:05 UTC (permalink / raw)
  To: Justin Tobler
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

On Wed, Feb 19, 2025 at 12:21:44PM -0600, Justin Tobler wrote:
> > diff --git a/builtin/update-ref.c b/builtin/update-ref.c
> > index 4d35bdc4b4b..d603f54b770 100644
> > --- a/builtin/update-ref.c
> > +++ b/builtin/update-ref.c
> > @@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
> > @@ -783,7 +786,8 @@ int cmd_update_ref(int argc,
> >  			 * must not already exist:
> >  			 */
> >  			oidclr(&oldoid, the_repository->hash_algo);
> > -		else if (repo_get_oid(the_repository, oldval, &oldoid))
> > +		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
> > +						 GET_OID_SKIP_AMBIGUITY_CHECK))
> >  			die("%s: not a valid old SHA1", oldval);
> >  	}
> 
> In builtin/update-ref.c all uses of repo_get_oid() have been converted
> to repo_get_oid_with_flags() with the GET_OID_SKIP_AMBIGUITY_CHECK flag
> except for one in parse_cmd_symref_update(). Is there reason to leave
> that one untouched?

Ah, no, this was a mere oversight. Good catch, fixed.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-19 13:23   ` [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-02-21  8:00     ` Jeff King
  2025-02-21  8:36       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: Jeff King @ 2025-02-21  8:00 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Junio C Hamano, shejialuo,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:29PM +0100, Patrick Steinhardt wrote:

> When reading an object ID via `get_oid_basic()` or any of its related
> functions we perform a check whether the object ID is ambiguous, which
> can be the case when a reference with the same name exists. While the
> check is generally helpful, there are cases where it only adds to the
> runtime overhead without providing much of a benefit.
> 
> Add a new flag that allows us to disable the check. The flag will be
> used in a subsequent commit.

If we are going to switch to this and get rid of the global
warn_on_object_refname_ambiguity flag, I could see it being worth it.

But when I looked into doing that, it did not make much sense (there are
too many code paths that share the same get_oid calls, and you'd have to
plumb the flags through the stack).

So if we are going to leave the global flag anyway, and if your patch 3
is just changing all of update-ref to pass the per-call flag in every
call, why don't we just skip this new mechanism and have update-ref
unset the warn_on_object_refname_ambiguity flag?

That makes patch 3 a one-liner, and patches 1 and 2 can go away.

-Peff

PS Sorry, I haven't looked carefully at the rest of the series. I've
   been moving houses and am way back-logged on Git stuff, so don't
   count on me reviewing it anytime soon.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-21  8:00     ` Jeff King
@ 2025-02-21  8:36       ` Patrick Steinhardt
  2025-02-21  9:06         ` Jeff King
  0 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-21  8:36 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Karthik Nayak, brian m. carlson, Junio C Hamano, shejialuo,
	Christian Couder

On Fri, Feb 21, 2025 at 03:00:03AM -0500, Jeff King wrote:
> On Wed, Feb 19, 2025 at 02:23:29PM +0100, Patrick Steinhardt wrote:
> 
> > When reading an object ID via `get_oid_basic()` or any of its related
> > functions we perform a check whether the object ID is ambiguous, which
> > can be the case when a reference with the same name exists. While the
> > check is generally helpful, there are cases where it only adds to the
> > runtime overhead without providing much of a benefit.
> > 
> > Add a new flag that allows us to disable the check. The flag will be
> > used in a subsequent commit.
> 
> If we are going to switch to this and get rid of the global
> warn_on_object_refname_ambiguity flag, I could see it being worth it.
> 
> But when I looked into doing that, it did not make much sense (there are
> too many code paths that share the same get_oid calls, and you'd have to
> plumb the flags through the stack).
> 
> So if we are going to leave the global flag anyway, and if your patch 3
> is just changing all of update-ref to pass the per-call flag in every
> call, why don't we just skip this new mechanism and have update-ref
> unset the warn_on_object_refname_ambiguity flag?
> 
> That makes patch 3 a one-liner, and patches 1 and 2 can go away.
> 
> -Peff
> 
> PS Sorry, I haven't looked carefully at the rest of the series. I've
>    been moving houses and am way back-logged on Git stuff, so don't
>    count on me reviewing it anytime soon.

Spoiler alert: I do have a patch series locally that gets rid of the
global variable completely, and that series builds on top of the new
flag I'm introducing here. So I'd prefer to keep it so that we can
eventually have less callsites that rely on global state.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-21  8:36       ` Patrick Steinhardt
@ 2025-02-21  9:06         ` Jeff King
  0 siblings, 0 replies; 163+ messages in thread
From: Jeff King @ 2025-02-21  9:06 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Junio C Hamano, shejialuo,
	Christian Couder

On Fri, Feb 21, 2025 at 09:36:04AM +0100, Patrick Steinhardt wrote:

> Spoiler alert: I do have a patch series locally that gets rid of the
> global variable completely, and that series builds on top of the new
> flag I'm introducing here. So I'd prefer to keep it so that we can
> eventually have less callsites that rely on global state.

OK. I do agree that getting rid of the global would be nice. I just
wasn't sure how ugly it would be to pass the flags through the stack
into handle_revision().

-Peff


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-02-19 13:23   ` [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-02-24 13:08     ` shejialuo
  2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-24 13:08 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:37PM +0100, Patrick Steinhardt wrote:
> Reftable iterators need to be scrapped after they have either been
> exhausted or aren't useful to the caller anymore, and it is explicitly
> not possible to reuse them for iterations. But enabling for reuse of
> iterators may allow us to tune them by reusing internal state of an
> iterator. The reftable iterators for example can already be reused
> internally, but we're not able to expose this to any users outside of
> the reftable backend.
> 

Out of curiosity, is there any benefits for reusing iterators for files
backend?

> Introduce a new `.seek` function in the ref iterator vtable that allows
> callers to re-seek an iterator. It is expected to be functionally the

It's a bit strange that we use "re-seek". I think we just want to see an
iterator. Isn't it? Don't worth a reroll.

> same as calling `refs_ref_iterator_begin()` with a different (or the
> same) prefix.
> 
> Implement the callback for trivial cases. The other iterators will be
> implemented in subsequent commits.
> 

[snip]

> @@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	return ok;
>  }
>  
> +static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				    const char *prefix)
> +{
> +	struct prefix_ref_iterator *iter =
> +		(struct prefix_ref_iterator *)ref_iterator;
> +	free(iter->prefix);

Here, we need to free the "iter->prefix". We don't know whether the
caller would call `prefix_ref_iterator_seek` many times for the same ref
iterator. So, we need to restore the state/context.

I want to ask a question here, why don't we care about "trim" parameter
which is declared in the "prefix_ref_iterator_begin"? From my
understanding, we may want to reuse "trim" state in the original state.
So, when we want to reuse iterator, we only consider about the "prefix"
but leave the other stats the same. Is my understanding correct?

> +	iter->prefix = xstrdup_or_null(prefix);
> +	return ref_iterator_seek(iter->iter0, prefix);
> +}
> +

> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 74e2c03cef1..3f6d43110b7 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -327,6 +327,21 @@ struct ref_iterator {
>   */
>  int ref_iterator_advance(struct ref_iterator *ref_iterator);
>  
> +/*
> + * Seek the iterator to the first reference with the given prefix.
> + * The prefix is matched as a literal string, without regard for path
> + * separators. If prefix is NULL or the empty string, seek the iterator to the
> + * first reference again.
> + *
> + * This function is expected to behave as if a new ref iterator with the same
> + * prefix had been created, but allows reuse of iterators and thus may allow
> + * the backend to optimize.

I somehow think we may emphasis that we want to reuse some internal
states of the ref iterator except the prefix. However, I am not sure.
Just think about this.

> + *
> + * Returns 0 on success, a negative error code otherwise.
> + */
> +int ref_iterator_seek(struct ref_iterator *ref_iterator,
> +		      const char *prefix);
> +
>  /*
>   * If possible, peel the reference currently being viewed by the
>   * iterator. Return 0 on success.
> @@ -445,6 +460,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
>   */
>  typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
>  
> +/*
> + * Seek the iterator to the first reference matching the given prefix. Should

Maybe which should?

> + * behave the same as if a new iterator was created with the same prefix.
> + */

This statement makes me a little confused. I think there are something
difference between `seek` and `begin`? For prefix ref iterator, we will
pass "trim" when calling `begin`, but for seeking, we don't care about
"trim". Although the prefix maybe the same, but the internal state may
be different.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators
  2025-02-19 13:23   ` [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-02-24 13:37     ` shejialuo
  2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-24 13:37 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:38PM +0100, Patrick Steinhardt wrote:
> Implement seeking on merged iterators. The implementation is rather
> straight forward, with the only exception that we must not deallocate
> the underlying iterators once they have been exhausted.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
>  1 file changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/refs/iterator.c b/refs/iterator.c
> index 757b105261a..63608ef9907 100644
> --- a/refs/iterator.c
> +++ b/refs/iterator.c
> @@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
>  struct merge_ref_iterator {
>  	struct ref_iterator base;
>  
> -	struct ref_iterator *iter0, *iter1;
> +	struct ref_iterator *iter0, *iter0_owned;
> +	struct ref_iterator *iter1, *iter1_owned;

We would always free `iter0_owned` and `iter1_owned`. That's the reason
why in the below code, we could drop `ref_iterator_free`. Make sense.

[snip]

> +static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				   const char *prefix)
> +{
> +	struct merge_ref_iterator *iter =
> +		(struct merge_ref_iterator *)ref_iterator;
> +	int ret;
> +
> +	iter->current = NULL;
> +	iter->iter0 = iter->iter0_owned;
> +	iter->iter1 = iter->iter1_owned;
> +
> +	ret = ref_iterator_seek(iter->iter0, prefix);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ref_iterator_seek(iter->iter1, prefix);
> +	if (ret < 0)
> +		return ret;

We could simply use a single `if` statement to handle this. Is the
reason why we design this is that we want to return the exact error code
for each case?

> +
> +	return 0;
> +}
> +
>  static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  				   struct object_id *peeled)
>  {
> @@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
>  {
>  	struct merge_ref_iterator *iter =
>  		(struct merge_ref_iterator *)ref_iterator;
> -	ref_iterator_free(iter->iter0);
> -	ref_iterator_free(iter->iter1);
> +	ref_iterator_free(iter->iter0_owned);
> +	ref_iterator_free(iter->iter1_owned);

We free the internal pointer but not the pointer exposed to the caller.
Make sense.

>  }
>  
>  static struct ref_iterator_vtable merge_ref_iterator_vtable = {
>  	.advance = merge_ref_iterator_advance,
> +	.seek = merge_ref_iterator_seek,
>  	.peel = merge_ref_iterator_peel,
>  	.release = merge_ref_iterator_release,
>  };
> @@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
>  	 */
>  
>  	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
> -	iter->iter0 = iter0;
> -	iter->iter1 = iter1;
> +	iter->iter0 = iter->iter0_owned = iter0;
> +	iter->iter1 = iter->iter1_owned = iter1;

OK, we would assign `iter0` to `iter0_owned` and `iter1` to `iter1_owned`.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators
  2025-02-19 13:23   ` [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-02-24 14:00     ` shejialuo
  2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-24 14:00 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:39PM +0100, Patrick Steinhardt wrote:
> Implement seeking of reftable iterators. As the low-level reftable
> iterators already support seeking this change is straight-forward. Two
> notes though:
> 
>   - We do not support seeking on reflog iterators.
> 
>   - We start to check whether `reftable_stack_init_ref_iterator()` is
>     successful.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 06543f79c64..b0c09f34433 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -547,7 +547,7 @@ struct reftable_ref_iterator {
>  	struct reftable_ref_record ref;
>  	struct object_id oid;
>  
> -	const char *prefix;
> +	char *prefix;
>  	size_t prefix_len;
>  	char **exclude_patterns;
>  	size_t exclude_patterns_index;
> @@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	return ITER_OK;
>  }
>  
> +static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				      const char *prefix)
> +{
> +	struct reftable_ref_iterator *iter =
> +		(struct reftable_ref_iterator *)ref_iterator;
> +
> +	free(iter->prefix);
> +	iter->prefix = xstrdup_or_null(prefix);
> +	iter->prefix_len = prefix ? strlen(prefix) : 0;
> +	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);

Should we rename this function `reftable_iterator_seek_ref` by the way?
It is a little strange that we have two functions which are so similar:

1. reftable_ref_iterator_seek
2. reftable_iterator_seek_ref

However, don't worth a reroll.

> +
> +	return iter->err;
> +}
> +
>  static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  				      struct object_id *peeled)
>  {
> @@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
>  			free(iter->exclude_patterns[i]);
>  		free(iter->exclude_patterns);
>  	}
> +	free(iter->prefix);
>  }
>  
>  static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
>  	.advance = reftable_ref_iterator_advance,
> +	.seek = reftable_ref_iterator_seek,
>  	.peel = reftable_ref_iterator_peel,
>  	.release = reftable_ref_iterator_release,
>  };
> @@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
>  
>  	iter = xcalloc(1, sizeof(*iter));
>  	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
> -	iter->prefix = prefix;
> -	iter->prefix_len = prefix ? strlen(prefix) : 0;

We don't assign `iter->prefix` and `iter->prefix_len` here. This is
because we want to use the new defined function
`reftable_ref_iterator_seek`. In the fist glance, I am worried that
"iter->prefix" might not be `NULL`. However, because we use `xcalloc`,
"iter->prefix" would be `NULL` by default.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-02-19 13:23   ` [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-02-24 14:49     ` shejialuo
  2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-24 14:49 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:40PM +0100, Patrick Steinhardt wrote:
> Implement seeking of ref-cache iterators. This is done by splitting most
> of the logic to seek iterators out of `cache_ref_iterator_begin()` and
> putting it into `cache_ref_iterator_seek()` so that we can reuse the
> logic.
> 
> Note that we cannot use the optimization anymore where we return an
> empty ref iterator when there aren't any references, as otherwise it
> wouldn't be possible to reseek the iterator to a different prefix that
> may exist. This shouldn't be much of a performance corncern though as we
> now start to bail out early in case `advance()` sees that there are no
> more directories to be searched.
> 

Bit: corncern/concern. Don't worth a reroll.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/ref-cache.c | 74 ++++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 48 insertions(+), 26 deletions(-)
> 
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 6457e02c1ea..b54547d71ee 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -362,9 +362,7 @@ struct cache_ref_iterator {
>  	struct ref_iterator base;
>  
>  	/*
> -	 * The number of levels currently on the stack. This is always
> -	 * at least 1, because when it becomes zero the iteration is
> -	 * ended and this struct is freed.
> +	 * The number of levels currently on the stack.
>  	 */

So, this value could be zero? We want to use this to optimize because
that we don't return the empty ref iterator any more.

>  	size_t levels_nr;
>  
> @@ -389,6 +387,9 @@ struct cache_ref_iterator {
>  	struct cache_ref_iterator_level *levels;
>  
>  	struct repository *repo;
> +	struct ref_cache *cache;
> +
> +	int prime_dir;

The reason why we needs to add these two states is that when using
`cache_ref_iterator_begin`, we need to pass `ref_cache` and
`prime_dir`. So, we need to store the state when reusing the ref
iterator.

>  };
>  
>  static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> @@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	struct cache_ref_iterator *iter =
>  		(struct cache_ref_iterator *)ref_iterator;
>  
> +	if (!iter->levels_nr)
> +		return ITER_DONE;
> +

Ok, we will check whether the cache ref iterator is exhausted.

>  	while (1) {
>  		struct cache_ref_iterator_level *level =
>  			&iter->levels[iter->levels_nr - 1];
> @@ -444,6 +448,40 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	}
>  }
>  
> +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				   const char *prefix)
> +{
> +	struct cache_ref_iterator *iter =
> +		(struct cache_ref_iterator *)ref_iterator;
> +	struct ref_dir *dir;
> +
> +	dir = get_ref_dir(iter->cache->root);
> +	if (prefix && *prefix)
> +		dir = find_containing_dir(dir, prefix);
> +
> +	if (dir) {
> +		struct cache_ref_iterator_level *level;
> +
> +		if (iter->prime_dir)
> +			prime_ref_dir(dir, prefix);
> +		iter->levels_nr = 1;
> +		level = &iter->levels[0];
> +		level->index = -1;
> +		level->dir = dir;
> +
> +		if (prefix && *prefix) {
> +			iter->prefix = xstrdup(prefix);

Should we free the original `iter->prefix` before we assign the new
`prefix`? I have seen this pattern in previous patch. If the caller
calls this function multiple times, there would be memory leak.

> +			level->prefix_state = PREFIX_WITHIN_DIR;
> +		} else {
> +			level->prefix_state = PREFIX_CONTAINS_DIR;
> +		}
> +	} else {
> +		iter->levels_nr = 0;
> +	}

When we cannot find the dir, we set the `iter->levels_nr = 0`. Could we
first check

    if (!dir) {
	iter->levels_nr = 0;
	return 0;
    }

And thus we could avoid indentation. However, it seems that we always
return 0. So, maybe we should not change.

> +
> +	return 0;

I know your motivation that you want to normally return the ref iterator
thus we can reuse later. The original behavior is that we return an
empty ref iterator but empty ref iterator cannot be reused. So, we will
always get the cache ref iterator. If the level is 0, we still have a
valid cache ref iterator. Make sense.

> +}
> +

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators
  2025-02-19 13:23   ` [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
@ 2025-02-24 15:09     ` shejialuo
  2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-24 15:09 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:41PM +0100, Patrick Steinhardt wrote:
> Implement seeking of `packed-ref` iterators. The implementation is again
> straight forward, except that we cannot continue to use the prefix
> iterator as we would otherwise not be able to reseek the iterator
> anymore in case one first asks for an empty and then for a non-empty
> prefix. Instead, we open-code the logic to in `advance()`.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/packed-backend.c | 62 +++++++++++++++++++++++++++++++++------------------
>  1 file changed, 40 insertions(+), 22 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 38a1956d1a8..71a38acfedc 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -819,6 +819,8 @@ struct packed_ref_iterator {
>  
>  	struct snapshot *snapshot;
>  
> +	char *prefix;
> +
>  	/* The current position in the snapshot's buffer: */
>  	const char *pos;
>  
> @@ -841,11 +843,9 @@ struct packed_ref_iterator {
>  };
>  
>  /*
> - * Move the iterator to the next record in the snapshot, without
> - * respect for whether the record is actually required by the current
> - * iteration. Adjust the fields in `iter` and return `ITER_OK` or
> - * `ITER_DONE`. This function does not free the iterator in the case
> - * of `ITER_DONE`.
> + * Move the iterator to the next record in the snapshot. Adjust the fields in
> + * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
> + * iterator in the case of `ITER_DONE`.
>   */
>  static int next_record(struct packed_ref_iterator *iter)
>  {
> @@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  	int ok;
>  
>  	while ((ok = next_record(iter)) == ITER_OK) {
> +		const char *refname = iter->base.refname;
> +		const char *prefix = iter->prefix;
> +
>  		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
>  		    !is_per_worktree_ref(iter->base.refname))
>  			continue;
> @@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  					    &iter->oid, iter->flags))
>  			continue;
>  
> +		while (prefix && *prefix) {
> +			if (*refname < *prefix)
> +				BUG("packed-refs backend yielded reference preceding its prefix");
> +			else if (*refname > *prefix)
> +				return ITER_DONE;
> +			prefix++;
> +			refname++;
> +		}

Although I cannot understand the code, I want to ask a question here, we
we need to do this in `advance`? Should we check this for
`packed_ref_iterator_seek` or in the `next_record` function?

Before we introduce `seek`, we don't need this logic. I somehow think we
should do this in `packed_ref_iterator_seek`.

> +
>  		return ITER_OK;
>  	}
>  
>  	return ok;
>  }
>  
> +static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
> +				    const char *prefix)
> +{
> +	struct packed_ref_iterator *iter =
> +		(struct packed_ref_iterator *)ref_iterator;
> +	const char *start;
> +
> +	if (prefix && *prefix)
> +		start = find_reference_location(iter->snapshot, prefix, 0);
> +	else
> +		start = iter->snapshot->start;
> +
> +	free(iter->prefix);
> +	iter->prefix = xstrdup_or_null(prefix);
> +	iter->pos = start;
> +	iter->eof = iter->snapshot->eof;
> +
> +	return 0;
> +}
> +
>  static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
>  				   struct object_id *peeled)
>  {
> @@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
>  		(struct packed_ref_iterator *)ref_iterator;
>  	strbuf_release(&iter->refname_buf);
>  	free(iter->jump);
> +	free(iter->prefix);
>  	release_snapshot(iter->snapshot);
>  }
>  
>  static struct ref_iterator_vtable packed_ref_iterator_vtable = {
>  	.advance = packed_ref_iterator_advance,
> +	.seek = packed_ref_iterator_seek,
>  	.peel = packed_ref_iterator_peel,
>  	.release = packed_ref_iterator_release,
>  };
> @@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
>  {
>  	struct packed_ref_store *refs;
>  	struct snapshot *snapshot;
> -	const char *start;
>  	struct packed_ref_iterator *iter;
>  	struct ref_iterator *ref_iterator;
>  	unsigned int required_flags = REF_STORE_READ;
> @@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
>  	 */
>  	snapshot = get_snapshot(refs);
>  
> -	if (prefix && *prefix)
> -		start = find_reference_location(snapshot, prefix, 0);
> -	else
> -		start = snapshot->start;
> -
> -	if (start == snapshot->eof)
> -		return empty_ref_iterator_begin();
> -

So, we don't return empty ref iterator. This is the same motivation like
the previous patch.

>  	CALLOC_ARRAY(iter, 1);
>  	ref_iterator = &iter->base;
>  	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
> @@ -1130,19 +1155,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
>  
>  	iter->snapshot = snapshot;
>  	acquire_snapshot(snapshot);
> -
> -	iter->pos = start;
> -	iter->eof = snapshot->eof;
>  	strbuf_init(&iter->refname_buf, 0);
> -
>  	iter->base.oid = &iter->oid;
> -
>  	iter->repo = ref_store->repo;
>  	iter->flags = flags;
>  
> -	if (prefix && *prefix)
> -		/* Stop iteration after we've gone *past* prefix: */
> -		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
> +	packed_ref_iterator_seek(&iter->base, prefix);

Why don't we check the return value here? Actually, in the previous
patch, `cache_ref_iterator_seek` will always return 0, but you still
check. I have thought that you just want to be more defensive.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 16/16] refs: reuse iterators when determining refname availability
  2025-02-19 13:23   ` [PATCH v2 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
@ 2025-02-24 15:14     ` shejialuo
  0 siblings, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-02-24 15:14 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:43PM +0100, Patrick Steinhardt wrote:
> When verifying whether refnames are available we have to verify whether
> any reference exists that is nested under the current reference. E.g.
> given a reference "refs/heads/foo", we must make sure that there is no
> other reference "refs/heads/foo/*".
> 
> This check is performed using a ref iterator with the prefix set to the
> nested reference namespace. Until now it used to not be possible to
> reseek iterators, so we always had to reallocate the iterator for every
> single reference we're about to check. This keeps us from reusing state
> that the iterator may have and that may make it work more efficiently.
> 
> Refactor the logic to reseek iterators. This leads to a sizeable speedup
> with the "reftable" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
>       Range (min … max):    38.4 ms …  42.0 ms    62 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
>       Range (min … max):    29.8 ms …  34.3 ms    74 runs
> 
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> The "files" backend doesn't really show a huge impact:
> 
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
>       Range (min … max):   384.6 ms … 404.5 ms    10 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
>       Range (min … max):   377.0 ms … 397.7 ms    10 runs
> 
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> This is mostly because it is way slower to begin with because it has to
> create a separate file for each new reference, so the milliseconds we
> shave off by reseeking the iterator doesn't really translate into a
> significant relative improvement.

Interesting, because there are many I/O operations which hides the
compute latency. Even though we improve the compute speed, the I/O
operations would still delay the process.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 00/16] refs: batch refname availability checks
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
                     ` (15 preceding siblings ...)
  2025-02-19 13:23   ` [PATCH v2 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
@ 2025-02-24 15:18   ` shejialuo
  2025-02-25  7:39     ` Patrick Steinhardt
  16 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-02-24 15:18 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Feb 19, 2025 at 02:23:27PM +0100, Patrick Steinhardt wrote:
> Hi,
> 
> this patch series has been inspired by brian's report that the reftable
> backend is significantly slower when writing many references compared to
> the files backend. As explained in that thread, the underlying issue is
> the design of tombstone references: when we first delete all references
> in a repository and then recreate them, we still have all the tombstones
> and thus we need to churn through all of them to figure out that they
> have been deleted in the first place. The files backend does not have
> this issue.
> 
> I consider the benchmark itself to be kind of broken, as it stems from
> us deleting all refs and then recreating them. And if you pack refs in
> between then the "reftable" backend outperforms the "files" backend.
> 
> But there are a couple of opportunities here anyway. While we cannot
> make the underlying issue of tombstones being less efficient go away,
> this has prompted me to have a deeper look at where we spend all the
> time. There are three ideas in this series:
> 
>   - git-update-ref(1) performs ambiguity checks for any full-size object
>     ID, which triggers a lot of reads. This is somewhat pointless though
>     given that the manpage explicitly points out that the command is
>     about object IDs, even though it does know to parse refs. But being
>     part of plumbing, emitting the warning here does not make a ton of
>     sense, and favoring object IDs over references in these cases is the
>     obvious thing to do anyway.
> 
>   - For each ref "refs/heads/bar", we need to verify that neither
>     "refs/heads" nor "refs" exists. This was repeated for every refname,
>     but because most refnames use common prefixes this made us re-check
>     a lot of prefixes. This is addressed by using a `strset` of already
>     checked prefixes.
> 
>   - For each ref "refs/heads/bar", we need to verify that no ref
>     "refs/heads/bar/*" exists. We always created a new ref iterator for
>     this check, which requires us to discard all internal state and then
>     recreate it. The reftable library has already been refactored though
>     to have reseekable iterators, so we backfill this functionality to
>     all the other iterators and then reuse the iterator.
> 
> With the (somewhat broken) benchmark we see a small speedup with the
> "files" backend:
> 
>     Benchmark 1: update-ref (refformat = files, revision = master)
>       Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
>       Range (min … max):   232.2 ms … 236.9 ms    10 runs
> 
>     Benchmark 2: update-ref (refformat = files, revision = HEAD)
>       Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
>       Range (min … max):   181.1 ms … 187.0 ms    10 runs
> 
>     Summary
>       update-ref (refformat = files, revision = HEAD) ran
>         1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)
> 
> And a huge speedup with the "reftable" backend:
> 
>     Benchmark 1: update-ref (refformat = reftable, revision = master)
>       Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
>       Range (min … max):   16.785 s … 16.982 s    10 runs
> 
>     Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
>       Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
>       Range (min … max):    2.215 s …  2.244 s    10 runs
> 
>     Summary
>       update-ref (refformat = reftable, revision = HEAD) ran
>         7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)
> 
> We're still not up to speed with the "files" backend, but considerably
> better. Given that this is an extreme edge case and not reflective of
> the general case I'm okay with this result for now.
> 
> But more importantly, this refactoring also has a positive effect when
> updating references in a repository with preexisting refs, which I
> consider to be the more realistic scenario. The following benchmark
> creates 10k refs with 100k preexisting refs.
> 
> With the "files" backend we see a modest improvement:
> 
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
>       Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
>       Range (min … max):   465.4 ms … 496.6 ms    10 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
>       Range (min … max):   376.5 ms … 403.1 ms    10 runs
> 
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
> 
> But with the "reftable" backend we see an almost 5x improvement, where
> it's now ~15x faster than the "files" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
>       Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
>       Range (min … max):   150.5 ms … 158.4 ms    18 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
>       Range (min … max):    29.8 ms …  38.6 ms    71 runs
> 
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
> 

This is an amazing improvement.

> The series is structured as follows:
> 
>   - Patches 1 to 4 implement the logic to skip ambiguity checks in
>     git-update-ref(1).
> 
>   - Patch 5 to 8 introduce batched checks.
> 
>   - Patch 9 deduplicates the ref prefix checks.
> 
>   - Patch 10 to 16 implement the infrastructure to reseek iterators.
> 
>   - Patch 17 starts to reuse iterators for nested ref checks.
> 
> Changes in v2:
>   - Point out why we also have to touch up the `dir_iterator`.
>   - Fix up the comment explaining `ITER_DONE`.
>   - Fix up comments that show usage patterns of the ref and dir iterator
>     interfaces.
>   - Start batching availability checks in the "files" backend, as well.
>   - Improve the commit message that drops the ambiguity check so that we
>     also point to 25fba78d36b (cat-file: disable object/refname
>     ambiguity check for batch mode, 2013-07-12).
>   - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im
> 
> Thanks!
> 
> Patrick
> 
> [1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

I have reviewed [PATCH v2 09/16] - [PATCH v2 16/16], leave some
comments. For other patches, I don't have energy to review. So maybe
others could help.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-02-24 13:08     ` shejialuo
@ 2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  7:39 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 24, 2025 at 09:08:40PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 02:23:37PM +0100, Patrick Steinhardt wrote:
> > Reftable iterators need to be scrapped after they have either been
> > exhausted or aren't useful to the caller anymore, and it is explicitly
> > not possible to reuse them for iterations. But enabling for reuse of
> > iterators may allow us to tune them by reusing internal state of an
> > iterator. The reftable iterators for example can already be reused
> > internally, but we're not able to expose this to any users outside of
> > the reftable backend.
> > 
> 
> Out of curiosity, is there any benefits for reusing iterators for files
> backend?

We see a small improvement there due to reduced allocation churn, but
overall the benefit is more limited there, as shown by the last commit
in this series where we start to reuse iterators. The refactorings may
unlock further optimization potential by reusing more of the iterators'
state, but I haven't checked for the "files" backend.

> > Introduce a new `.seek` function in the ref iterator vtable that allows
> > callers to re-seek an iterator. It is expected to be functionally the
> 
> It's a bit strange that we use "re-seek". I think we just want to see an
> iterator. Isn't it? Don't worth a reroll.

Well, it's really only relevant in the case where we want to seek on an
iterator multiple times because we already seek when creating the
iterator. I'll reformulate this slightly.

> > @@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  	return ok;
> >  }
> >  
> > +static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
> > +				    const char *prefix)
> > +{
> > +	struct prefix_ref_iterator *iter =
> > +		(struct prefix_ref_iterator *)ref_iterator;
> > +	free(iter->prefix);
> 
> Here, we need to free the "iter->prefix". We don't know whether the
> caller would call `prefix_ref_iterator_seek` many times for the same ref
> iterator. So, we need to restore the state/context.
> 
> I want to ask a question here, why don't we care about "trim" parameter
> which is declared in the "prefix_ref_iterator_begin"? From my
> understanding, we may want to reuse "trim" state in the original state.
> So, when we want to reuse iterator, we only consider about the "prefix"
> but leave the other stats the same. Is my understanding correct?

Yeah, it's correct. I couldn't find a usecase for also adjusting "trim"
when seeking multiple times, so I didn't introduce this functionality.
Let me mention this in the commit message.

> > +	iter->prefix = xstrdup_or_null(prefix);
> > +	return ref_iterator_seek(iter->iter0, prefix);
> > +}
> > +
> 
> > diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> > index 74e2c03cef1..3f6d43110b7 100644
> > --- a/refs/refs-internal.h
> > +++ b/refs/refs-internal.h
> > @@ -327,6 +327,21 @@ struct ref_iterator {
> >   */
> >  int ref_iterator_advance(struct ref_iterator *ref_iterator);
> >  
> > +/*
> > + * Seek the iterator to the first reference with the given prefix.
> > + * The prefix is matched as a literal string, without regard for path
> > + * separators. If prefix is NULL or the empty string, seek the iterator to the
> > + * first reference again.
> > + *
> > + * This function is expected to behave as if a new ref iterator with the same
> > + * prefix had been created, but allows reuse of iterators and thus may allow
> > + * the backend to optimize.
> 
> I somehow think we may emphasis that we want to reuse some internal
> states of the ref iterator except the prefix. However, I am not sure.
> Just think about this.

Yup, I'll improve the comment.

> > @@ -445,6 +460,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
> >   */
> >  typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
> >  
> > +/*
> > + * Seek the iterator to the first reference matching the given prefix. Should
> 
> Maybe which should?
> 
> > + * behave the same as if a new iterator was created with the same prefix.
> > + */
> 
> This statement makes me a little confused. I think there are something
> difference between `seek` and `begin`? For prefix ref iterator, we will
> pass "trim" when calling `begin`, but for seeking, we don't care about
> "trim". Although the prefix maybe the same, but the internal state may
> be different.

Here, as well.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators
  2025-02-24 13:37     ` shejialuo
@ 2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  7:39 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 24, 2025 at 09:37:36PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 02:23:38PM +0100, Patrick Steinhardt wrote:
> > diff --git a/refs/iterator.c b/refs/iterator.c
> > index 757b105261a..63608ef9907 100644
> > --- a/refs/iterator.c
> > +++ b/refs/iterator.c
> > @@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
> > +static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
> > +				   const char *prefix)
> > +{
> > +	struct merge_ref_iterator *iter =
> > +		(struct merge_ref_iterator *)ref_iterator;
> > +	int ret;
> > +
> > +	iter->current = NULL;
> > +	iter->iter0 = iter->iter0_owned;
> > +	iter->iter1 = iter->iter1_owned;
> > +
> > +	ret = ref_iterator_seek(iter->iter0, prefix);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = ref_iterator_seek(iter->iter1, prefix);
> > +	if (ret < 0)
> > +		return ret;
> 
> We could simply use a single `if` statement to handle this. Is the
> reason why we design this is that we want to return the exact error code
> for each case?

Yup, I don't want to loose the error code. We could write this as:

    if ((ret = ref_iterator_seek(iter->iter0, prefix)) < 0 ||
        (ret = ref_iterator_seek(iter->iter0, prefix)) < 0)
            return ret;

But assigning to variables in conditions is not something we typically
do in the Git codebase.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators
  2025-02-24 14:00     ` shejialuo
@ 2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  7:39 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 24, 2025 at 10:00:28PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 02:23:39PM +0100, Patrick Steinhardt wrote:
> > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> > index 06543f79c64..b0c09f34433 100644
> > --- a/refs/reftable-backend.c
> > +++ b/refs/reftable-backend.c
> > @@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  	return ITER_OK;
> >  }
> >  
> > +static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
> > +				      const char *prefix)
> > +{
> > +	struct reftable_ref_iterator *iter =
> > +		(struct reftable_ref_iterator *)ref_iterator;
> > +
> > +	free(iter->prefix);
> > +	iter->prefix = xstrdup_or_null(prefix);
> > +	iter->prefix_len = prefix ? strlen(prefix) : 0;
> > +	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
> 
> Should we rename this function `reftable_iterator_seek_ref` by the way?
> It is a little strange that we have two functions which are so similar:
> 
> 1. reftable_ref_iterator_seek
> 2. reftable_iterator_seek_ref
> 
> However, don't worth a reroll.

Well, they do similar things, but at different levels:

  - `reftable_ref_iterator_seek()` operates on the high-level generic
    `struct ref_iterator`.

  - `reftable_iterator_seek_ref()` operates on the low-level `struct
    reftable_ref_iterator` provided by the reftable library.

As such I think that they are named appropriately as their prefixes tell
us which structure they operate on.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-02-24 14:49     ` shejialuo
@ 2025-02-25  7:39       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  7:39 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 24, 2025 at 10:49:14PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 02:23:40PM +0100, Patrick Steinhardt wrote:
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 6457e02c1ea..b54547d71ee 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -362,9 +362,7 @@ struct cache_ref_iterator {
> >  	struct ref_iterator base;
> >  
> >  	/*
> > -	 * The number of levels currently on the stack. This is always
> > -	 * at least 1, because when it becomes zero the iteration is
> > -	 * ended and this struct is freed.
> > +	 * The number of levels currently on the stack.
> >  	 */
> 
> So, this value could be zero? We want to use this to optimize because
> that we don't return the empty ref iterator any more.

Now it can, yes. Before it couldn't as we returned an empty iterator in
that case, but we cannot do anymore because you cannot re-seek an empty
iterator.

> > @@ -444,6 +448,40 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  	}
> >  }
> >  
> > +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
> > +				   const char *prefix)
> > +{
> > +	struct cache_ref_iterator *iter =
> > +		(struct cache_ref_iterator *)ref_iterator;
> > +	struct ref_dir *dir;
> > +
> > +	dir = get_ref_dir(iter->cache->root);
> > +	if (prefix && *prefix)
> > +		dir = find_containing_dir(dir, prefix);
> > +
> > +	if (dir) {
> > +		struct cache_ref_iterator_level *level;
> > +
> > +		if (iter->prime_dir)
> > +			prime_ref_dir(dir, prefix);
> > +		iter->levels_nr = 1;
> > +		level = &iter->levels[0];
> > +		level->index = -1;
> > +		level->dir = dir;
> > +
> > +		if (prefix && *prefix) {
> > +			iter->prefix = xstrdup(prefix);
> 
> Should we free the original `iter->prefix` before we assign the new
> `prefix`? I have seen this pattern in previous patch. If the caller
> calls this function multiple times, there would be memory leak.

Oh, good catch, yes.

> > +			level->prefix_state = PREFIX_WITHIN_DIR;
> > +		} else {
> > +			level->prefix_state = PREFIX_CONTAINS_DIR;
> > +		}
> > +	} else {
> > +		iter->levels_nr = 0;
> > +	}
> 
> When we cannot find the dir, we set the `iter->levels_nr = 0`. Could we
> first check
> 
>     if (!dir) {
> 	iter->levels_nr = 0;
> 	return 0;
>     }
> 
> And thus we could avoid indentation. However, it seems that we always
> return 0. So, maybe we should not change.

Good idea.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators
  2025-02-24 15:09     ` shejialuo
@ 2025-02-25  7:39       ` Patrick Steinhardt
  2025-02-25 12:07         ` shejialuo
  0 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  7:39 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 24, 2025 at 11:09:32PM +0800, shejialuo wrote:
> On Wed, Feb 19, 2025 at 02:23:41PM +0100, Patrick Steinhardt wrote:
> > Implement seeking of `packed-ref` iterators. The implementation is again
> > straight forward, except that we cannot continue to use the prefix
> > iterator as we would otherwise not be able to reseek the iterator
> > anymore in case one first asks for an empty and then for a non-empty
> > prefix. Instead, we open-code the logic to in `advance()`.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  refs/packed-backend.c | 62 +++++++++++++++++++++++++++++++++------------------
> >  1 file changed, 40 insertions(+), 22 deletions(-)
> > 
> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index 38a1956d1a8..71a38acfedc 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  					    &iter->oid, iter->flags))
> >  			continue;
> >  
> > +		while (prefix && *prefix) {
> > +			if (*refname < *prefix)
> > +				BUG("packed-refs backend yielded reference preceding its prefix");
> > +			else if (*refname > *prefix)
> > +				return ITER_DONE;
> > +			prefix++;
> > +			refname++;
> > +		}
> 
> Although I cannot understand the code, I want to ask a question here, we
> we need to do this in `advance`? Should we check this for
> `packed_ref_iterator_seek` or in the `next_record` function?
> 
> Before we introduce `seek`, we don't need this logic. I somehow think we
> should do this in `packed_ref_iterator_seek`.

We cannot do this in `packed_ref_iterator_seek()` because we need to do
it for every single record that we yield from the iterator. We _could_
do it in `next_record()`, but that function is rather complex already
and really only cares about yielding the next record. On the other hand,
`advance()` already knows to skip certain entries, so putting the logic
in there to also handle termination feels like a natural fit to me.

> > @@ -1130,19 +1155,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
> >  
> >  	iter->snapshot = snapshot;
> >  	acquire_snapshot(snapshot);
> > -
> > -	iter->pos = start;
> > -	iter->eof = snapshot->eof;
> >  	strbuf_init(&iter->refname_buf, 0);
> > -
> >  	iter->base.oid = &iter->oid;
> > -
> >  	iter->repo = ref_store->repo;
> >  	iter->flags = flags;
> >  
> > -	if (prefix && *prefix)
> > -		/* Stop iteration after we've gone *past* prefix: */
> > -		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
> > +	packed_ref_iterator_seek(&iter->base, prefix);
> 
> Why don't we check the return value here? Actually, in the previous
> patch, `cache_ref_iterator_seek` will always return 0, but you still
> check. I have thought that you just want to be more defensive.

Yeah, let's have the check over here, as well.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 00/16] refs: batch refname availability checks
  2025-02-24 15:18   ` [PATCH v2 00/16] refs: batch refname availability checks shejialuo
@ 2025-02-25  7:39     ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  7:39 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Mon, Feb 24, 2025 at 11:18:10PM +0800, shejialuo wrote:
> I have reviewed [PATCH v2 09/16] - [PATCH v2 16/16], leave some
> comments. For other patches, I don't have energy to review. So maybe
> others could help.

Thanks a lot for your review!

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v3 00/16] refs: batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (15 preceding siblings ...)
  2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
@ 2025-02-25  8:55 ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
                     ` (15 more replies)
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                   ` (2 subsequent siblings)
  19 siblings, 16 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Hi,

this patch series has been inspired by brian's report that the reftable
backend is significantly slower when writing many references compared to
the files backend. As explained in that thread, the underlying issue is
the design of tombstone references: when we first delete all references
in a repository and then recreate them, we still have all the tombstones
and thus we need to churn through all of them to figure out that they
have been deleted in the first place. The files backend does not have
this issue.

I consider the benchmark itself to be kind of broken, as it stems from
us deleting all refs and then recreating them. And if you pack refs in
between then the "reftable" backend outperforms the "files" backend.

But there are a couple of opportunities here anyway. While we cannot
make the underlying issue of tombstones being less efficient go away,
this has prompted me to have a deeper look at where we spend all the
time. There are three ideas in this series:

  - git-update-ref(1) performs ambiguity checks for any full-size object
    ID, which triggers a lot of reads. This is somewhat pointless though
    given that the manpage explicitly points out that the command is
    about object IDs, even though it does know to parse refs. But being
    part of plumbing, emitting the warning here does not make a ton of
    sense, and favoring object IDs over references in these cases is the
    obvious thing to do anyway.

  - For each ref "refs/heads/bar", we need to verify that neither
    "refs/heads" nor "refs" exists. This was repeated for every refname,
    but because most refnames use common prefixes this made us re-check
    a lot of prefixes. This is addressed by using a `strset` of already
    checked prefixes.

  - For each ref "refs/heads/bar", we need to verify that no ref
    "refs/heads/bar/*" exists. We always created a new ref iterator for
    this check, which requires us to discard all internal state and then
    recreate it. The reftable library has already been refactored though
    to have reseekable iterators, so we backfill this functionality to
    all the other iterators and then reuse the iterator.

With the (somewhat broken) benchmark we see a small speedup with the
"files" backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
      Range (min … max):   232.2 ms … 236.9 ms    10 runs

    Benchmark 2: update-ref (refformat = files, revision = HEAD)
      Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
      Range (min … max):   181.1 ms … 187.0 ms    10 runs

    Summary
      update-ref (refformat = files, revision = HEAD) ran
        1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)

And a huge speedup with the "reftable" backend:

    Benchmark 1: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
      Range (min … max):   16.785 s … 16.982 s    10 runs

    Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
      Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
      Range (min … max):    2.215 s …  2.244 s    10 runs

    Summary
      update-ref (refformat = reftable, revision = HEAD) ran
        7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)

We're still not up to speed with the "files" backend, but considerably
better. Given that this is an extreme edge case and not reflective of
the general case I'm okay with this result for now.

But more importantly, this refactoring also has a positive effect when
updating references in a repository with preexisting refs, which I
consider to be the more realistic scenario. The following benchmark
creates 10k refs with 100k preexisting refs.

With the "files" backend we see a modest improvement:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
      Range (min … max):   465.4 ms … 496.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
      Range (min … max):   376.5 ms … 403.1 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)

But with the "reftable" backend we see an almost 5x improvement, where
it's now ~15x faster than the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
      Range (min … max):   150.5 ms … 158.4 ms    18 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
      Range (min … max):    29.8 ms …  38.6 ms    71 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

The series is structured as follows:

  - Patches 1 to 4 implement the logic to skip ambiguity checks in
    git-update-ref(1).

  - Patch 5 to 8 introduce batched checks.

  - Patch 9 deduplicates the ref prefix checks.

  - Patch 10 to 16 implement the infrastructure to reseek iterators.

  - Patch 17 starts to reuse iterators for nested ref checks.

Changes in v2:
  - Point out why we also have to touch up the `dir_iterator`.
  - Fix up the comment explaining `ITER_DONE`.
  - Fix up comments that show usage patterns of the ref and dir iterator
    interfaces.
  - Start batching availability checks in the "files" backend, as well.
  - Improve the commit message that drops the ambiguity check so that we
    also point to 25fba78d36b (cat-file: disable object/refname
    ambiguity check for batch mode, 2013-07-12).
  - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im

Changes in v3:
  - Fix one case where we didn't skip ambiguity checks in
    git-update-ref(1).
  - Document better that only the prefix can change on reseeking
    iterators. Other internal state will remain the same.
  - Fix a memory leak in the ref-cache iterator.
  - Don't ignore errors returned by `packed_ref_iterator_seek()`.
  - Link to v2: https://lore.kernel.org/r/20250219-pks-update-ref-optimization-v2-0-e696e7220b22@pks.im

Thanks!

Patrick

[1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

---
Patrick Steinhardt (16):
      object-name: introduce `repo_get_oid_with_flags()`
      object-name: allow skipping ambiguity checks in `get_oid()` family
      builtin/update-ref: skip ambiguity checks when parsing object IDs
      refs: introduce function to batch refname availability checks
      refs/reftable: batch refname availability checks
      refs/files: batch refname availability checks for normal transactions
      refs/files: batch refname availability checks for initial transactions
      refs: stop re-verifying common prefixes for availability
      refs/iterator: separate lifecycle from iteration
      refs/iterator: provide infrastructure to re-seek iterators
      refs/iterator: implement seeking for merged iterators
      refs/iterator: implement seeking for reftable iterators
      refs/iterator: implement seeking for ref-cache iterators
      refs/iterator: implement seeking for packed-ref iterators
      refs/iterator: implement seeking for files iterators
      refs: reuse iterators when determining refname availability

 builtin/clone.c              |   2 +
 builtin/update-ref.c         |  15 ++--
 dir-iterator.c               |  24 +++---
 dir-iterator.h               |  11 +--
 hash.h                       |   1 +
 iterator.h                   |   2 +-
 object-name.c                |  18 +++--
 object-name.h                |   6 ++
 refs.c                       | 186 ++++++++++++++++++++++++++-----------------
 refs.h                       |  12 +++
 refs/debug.c                 |  20 +++--
 refs/files-backend.c         | 117 +++++++++++++++++----------
 refs/iterator.c              | 145 +++++++++++++++++----------------
 refs/packed-backend.c        |  92 ++++++++++++---------
 refs/ref-cache.c             |  88 ++++++++++++--------
 refs/refs-internal.h         |  53 +++++++-----
 refs/reftable-backend.c      |  85 +++++++++++---------
 t/helper/test-dir-iterator.c |   1 +
 18 files changed, 528 insertions(+), 350 deletions(-)

Range-diff versus v2:

 1:  b7b3e057628 !  1:  34198fbc1c0 object-name: introduce `repo_get_oid_with_flags()`
    @@ object-name.c: void object_context_release(struct object_context *ctx)
     - * notably "xyz^" for "parent of xyz"
     - */
     -int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
    -+int repo_get_oid_with_flags(struct repository *r, const char *name, struct object_id *oid,
    -+			    unsigned flags)
    ++int repo_get_oid_with_flags(struct repository *r, const char *name,
    ++			    struct object_id *oid, unsigned flags)
      {
      	struct object_context unused;
     -	int ret = get_oid_with_context(r, name, 0, oid, &unused);
    @@ object-name.h: void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repo
     + * This is like "get_oid_basic()", except it allows "object ID expressions",
     + * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
     + */
    -+int repo_get_oid_with_flags(struct repository *r, const char *str, struct object_id *oid,
    -+			    unsigned flags);
    ++int repo_get_oid_with_flags(struct repository *r, const char *str,
    ++			    struct object_id *oid, unsigned flags);
      int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
      __attribute__((format (printf, 2, 3)))
      int get_oidf(struct object_id *oid, const char *fmt, ...);
 2:  3cba1ffa8aa =  2:  f5d4d2a67ce object-name: allow skipping ambiguity checks in `get_oid()` family
 3:  80dfc2ee6b7 !  3:  0512c256641 builtin/update-ref: skip ambiguity checks when parsing object IDs
    @@ builtin/update-ref.c: static int parse_next_oid(const char **next, const char *e
      				goto invalid;
      		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
      			/* With -z, treat an empty value as all zeros: */
    +@@ builtin/update-ref.c: static void parse_cmd_symref_update(struct ref_transaction *transaction,
    + 			die("symref-update %s: expected old value", refname);
    + 
    + 		if (!strcmp(old_arg, "oid")) {
    +-			if (repo_get_oid(the_repository, old_target, &old_oid))
    ++			if (repo_get_oid_with_flags(the_repository, old_target, &old_oid,
    ++						    GET_OID_SKIP_AMBIGUITY_CHECK))
    + 				die("symref-update %s: invalid oid: %s", refname, old_target);
    + 
    + 			have_old_oid = 1;
     @@ builtin/update-ref.c: int cmd_update_ref(int argc,
      		refname = argv[0];
      		value = argv[1];
 4:  9bd05801ac0 =  4:  ad7bf4bed31 refs: introduce function to batch refname availability checks
 5:  0104c9759aa =  5:  132f09c4584 refs/reftable: batch refname availability checks
 6:  fba06e5fcb7 =  6:  bd7e5fa7bf1 refs/files: batch refname availability checks for normal transactions
 7:  dfb4be26147 =  7:  a907f62d2c2 refs/files: batch refname availability checks for initial transactions
 8:  e7abe4bae25 =  8:  866e5f4b4cc refs: stop re-verifying common prefixes for availability
 9:  1deae95c53a =  9:  736f1bd9afd refs/iterator: separate lifecycle from iteration
10:  8b942563e65 ! 10:  71d4c4c4655 refs/iterator: provide infrastructure to re-seek iterators
    @@ Commit message
         the reftable backend.
     
         Introduce a new `.seek` function in the ref iterator vtable that allows
    -    callers to re-seek an iterator. It is expected to be functionally the
    -    same as calling `refs_ref_iterator_begin()` with a different (or the
    -    same) prefix.
    +    callers to seek an iterator multiple times. It is expected to be
    +    functionally the same as calling `refs_ref_iterator_begin()` with a
    +    different (or the same) prefix.
    +
    +    Note that it is not possible to adjust parameters other than the seeked
    +    prefix for now, so exclude patterns, trimmed prefixes and flags will
    +    remain unchanged. We do not have a usecase for changing these parameters
    +    right now, but if we ever find one we can adapt accordingly.
     
         Implement the callback for trivial cases. The other iterators will be
         implemented in subsequent commits.
    @@ refs/refs-internal.h: struct ref_iterator {
     + *
     + * This function is expected to behave as if a new ref iterator with the same
     + * prefix had been created, but allows reuse of iterators and thus may allow
    -+ * the backend to optimize.
    ++ * the backend to optimize. Parameters other than the prefix that have been
    ++ * passed when creating the iterator will remain unchanged.
     + *
     + * Returns 0 on success, a negative error code otherwise.
     + */
11:  ad4f063ef06 = 11:  5a0412d754b refs/iterator: implement seeking for merged iterators
12:  ddac957862f ! 12:  ece7e500ecd refs/iterator: implement seeking for reftable iterators
    @@ Commit message
         iterators already support seeking this change is straight-forward. Two
         notes though:
     
    -      - We do not support seeking on reflog iterators.
    +      - We do not support seeking on reflog iterators. It is unclear what
    +        seeking would even look like in this context, as you typically would
    +        want to seek to a specific entry in the reflog for a specific ref.
    +        There is not currently a usecase for this, but if there ever is we
    +        can implement seeking in the future.
     
           - We start to check whether `reftable_stack_init_ref_iterator()` is
             successful.
13:  87b81552acf ! 13:  f693de656b5 refs/iterator: implement seeking for ref-cache iterators
    @@ Commit message
         Note that we cannot use the optimization anymore where we return an
         empty ref iterator when there aren't any references, as otherwise it
         wouldn't be possible to reseek the iterator to a different prefix that
    -    may exist. This shouldn't be much of a performance corncern though as we
    +    may exist. This shouldn't be much of a performance concern though as we
         now start to bail out early in case `advance()` sees that there are no
         more directories to be searched.
     
    @@ refs/ref-cache.c: struct cache_ref_iterator {
      	 */
      	size_t levels_nr;
      
    +@@ refs/ref-cache.c: struct cache_ref_iterator {
    + 	 * The prefix is matched textually, without regard for path
    + 	 * component boundaries.
    + 	 */
    +-	const char *prefix;
    ++	char *prefix;
    + 
    + 	/*
    + 	 * A stack of levels. levels[0] is the uppermost level that is
     @@ refs/ref-cache.c: struct cache_ref_iterator {
      	struct cache_ref_iterator_level *levels;
      
    @@ refs/ref-cache.c: static int cache_ref_iterator_advance(struct ref_iterator *ref
     +{
     +	struct cache_ref_iterator *iter =
     +		(struct cache_ref_iterator *)ref_iterator;
    ++	struct cache_ref_iterator_level *level;
     +	struct ref_dir *dir;
     +
     +	dir = get_ref_dir(iter->cache->root);
     +	if (prefix && *prefix)
     +		dir = find_containing_dir(dir, prefix);
    ++	if (!dir) {
    ++		iter->levels_nr = 0;
    ++		return 0;
    ++	}
     +
    -+	if (dir) {
    -+		struct cache_ref_iterator_level *level;
    -+
    -+		if (iter->prime_dir)
    -+			prime_ref_dir(dir, prefix);
    -+		iter->levels_nr = 1;
    -+		level = &iter->levels[0];
    -+		level->index = -1;
    -+		level->dir = dir;
    ++	if (iter->prime_dir)
    ++		prime_ref_dir(dir, prefix);
    ++	iter->levels_nr = 1;
    ++	level = &iter->levels[0];
    ++	level->index = -1;
    ++	level->dir = dir;
     +
    -+		if (prefix && *prefix) {
    -+			iter->prefix = xstrdup(prefix);
    -+			level->prefix_state = PREFIX_WITHIN_DIR;
    -+		} else {
    -+			level->prefix_state = PREFIX_CONTAINS_DIR;
    -+		}
    ++	if (prefix && *prefix) {
    ++		free(iter->prefix);
    ++		iter->prefix = xstrdup(prefix);
    ++		level->prefix_state = PREFIX_WITHIN_DIR;
     +	} else {
    -+		iter->levels_nr = 0;
    ++		FREE_AND_NULL(iter->prefix);
    ++		level->prefix_state = PREFIX_CONTAINS_DIR;
     +	}
     +
     +	return 0;
    @@ refs/ref-cache.c: static int cache_ref_iterator_advance(struct ref_iterator *ref
      				   struct object_id *peeled)
      {
     @@ refs/ref-cache.c: static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
    + {
    + 	struct cache_ref_iterator *iter =
    + 		(struct cache_ref_iterator *)ref_iterator;
    +-	free((char *)iter->prefix);
    ++	free(iter->prefix);
    + 	free(iter->levels);
    + }
      
      static struct ref_iterator_vtable cache_ref_iterator_vtable = {
      	.advance = cache_ref_iterator_advance,
14:  2619de30fe1 ! 14:  ac71647ee94 refs/iterator: implement seeking for `packed-ref` iterators
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    refs/iterator: implement seeking for `packed-ref` iterators
    +    refs/iterator: implement seeking for packed-ref iterators
     
         Implement seeking of `packed-ref` iterators. The implementation is again
         straight forward, except that we cannot continue to use the prefix
    @@ refs/packed-backend.c: static struct ref_iterator *packed_ref_iterator_begin(
     -	if (prefix && *prefix)
     -		/* Stop iteration after we've gone *past* prefix: */
     -		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
    -+	packed_ref_iterator_seek(&iter->base, prefix);
    ++	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
    ++		ref_iterator_free(&iter->base);
    ++		return NULL;
    ++	}
      
      	return ref_iterator;
      }
15:  d4f76e6480b ! 15:  02cafca513c refs/iterator: implement seeking for "files" iterators
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    refs/iterator: implement seeking for "files" iterators
    +    refs/iterator: implement seeking for files iterators
     
         Implement seeking for "files" iterators. As we simply use a ref-cache
         iterator under the hood the implementation is straight-forward. Note
16:  49017050289 = 16:  baed7615a97 refs: reuse iterators when determining refname availability

---
base-commit: e2067b49ecaef9b7f51a17ce251f9207f72ef52d
change-id: 20250217-pks-update-ref-optimization-15c795e66e2b



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v3 01/16] object-name: introduce `repo_get_oid_with_flags()`
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.

This function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 14 ++++++++------
 object-name.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 945d5bdef25..233f3f861e3 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
 	strbuf_release(&ctx->symlink_path);
 }
 
-/*
- * This is like "get_oid_basic()", except it allows "object ID expressions",
- * notably "xyz^" for "parent of xyz"
- */
-int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+int repo_get_oid_with_flags(struct repository *r, const char *name,
+			    struct object_id *oid, unsigned flags)
 {
 	struct object_context unused;
-	int ret = get_oid_with_context(r, name, 0, oid, &unused);
+	int ret = get_oid_with_context(r, name, flags, oid, &unused);
 	object_context_release(&unused);
 	return ret;
 }
 
+int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+{
+	return repo_get_oid_with_flags(r, name, oid, 0);
+}
+
 /*
  * This returns a non-zero value if the string (built using printf
  * format and the given arguments) is not a valid object.
diff --git a/object-name.h b/object-name.h
index 8dba4a47a47..cda4934cd5f 100644
--- a/object-name.h
+++ b/object-name.h
@@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
 void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 			      int abbrev_len);
 
+/*
+ * This is like "get_oid_basic()", except it allows "object ID expressions",
+ * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
+ */
+int repo_get_oid_with_flags(struct repository *r, const char *str,
+			    struct object_id *oid, unsigned flags);
 int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
 __attribute__((format (printf, 2, 3)))
 int get_oidf(struct object_id *oid, const char *fmt, ...);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.

Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 hash.h        | 1 +
 object-name.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hash.h b/hash.h
index 4367acfec50..79419016513 100644
--- a/hash.h
+++ b/hash.h
@@ -204,6 +204,7 @@ struct object_id {
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
 #define GET_OID_HASH_ANY      020000
+#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 233f3f861e3..85444dbb15b 100644
--- a/object-name.c
+++ b/object-name.c
@@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	int fatal = !(flags & GET_OID_QUIETLY);
 
 	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
-		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
+		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
+		    repo_settings_get_warn_ambiguous_refs(r) &&
+		    warn_on_object_refname_ambiguity) {
 			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 			if (refs_found > 0) {
 				warning(warn_msg, len, str);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-26 22:26     ` Junio C Hamano
  2025-02-25  8:55   ` [PATCH v3 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
                     ` (12 subsequent siblings)
  15 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve references, despite the fact that its manpage does
not mention this fact even once.

One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is out of two reasons:

  - The manpage is explicitly structured around object IDs. So if we see
    a fully blown object ID, the intent should be quite clear in
    general.

  - The command is part of our plumbing layer and not a tool that users
    would generally use in interactive workflows. As such, the warning
    will likely not be visible to anybody in the first place.

Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.

The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).

Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The following benchmark creates 10000 new
references with a 100000 preexisting refs with the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
      Range (min … max):   461.9 ms … 479.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
      Range (min … max):   384.9 ms … 405.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

And with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
      Range (min … max):   142.7 ms … 150.8 ms    19 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
      Range (min … max):    61.1 ms …  66.6 ms    41 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/update-ref.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4d35bdc4b4b..1d541e13ade 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
 		(*next)++;
 		*next = parse_arg(*next, &arg);
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else {
 			/* Without -z, an empty value means all zeros: */
@@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
 		*next += arg.len;
 
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
 			/* With -z, treat an empty value as all zeros: */
@@ -299,7 +301,8 @@ static void parse_cmd_symref_update(struct ref_transaction *transaction,
 			die("symref-update %s: expected old value", refname);
 
 		if (!strcmp(old_arg, "oid")) {
-			if (repo_get_oid(the_repository, old_target, &old_oid))
+			if (repo_get_oid_with_flags(the_repository, old_target, &old_oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				die("symref-update %s: invalid oid: %s", refname, old_target);
 
 			have_old_oid = 1;
@@ -772,7 +775,8 @@ int cmd_update_ref(int argc,
 		refname = argv[0];
 		value = argv[1];
 		oldval = argv[2];
-		if (repo_get_oid(the_repository, value, &oid))
+		if (repo_get_oid_with_flags(the_repository, value, &oid,
+					    GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid SHA1", value);
 	}
 
@@ -783,7 +787,8 @@ int cmd_update_ref(int argc,
 			 * must not already exist:
 			 */
 			oidclr(&oldoid, the_repository->hash_algo);
-		else if (repo_get_oid(the_repository, oldval, &oldoid))
+		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
+						 GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid old SHA1", oldval);
 	}
 

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 04/16] refs: introduce function to batch refname availability checks
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 05/16] refs/reftable: " Patrick Steinhardt
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.

Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.

The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 169 +++++++++++++++++++++++++++++++++++++----------------------------
 refs.h |  12 +++++
 2 files changed, 109 insertions(+), 72 deletions(-)

diff --git a/refs.c b/refs.c
index f4094a326a9..5a9b0f2fa1e 100644
--- a/refs.c
+++ b/refs.c
@@ -2467,19 +2467,15 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 	return ret;
 }
 
-int refs_verify_refname_available(struct ref_store *refs,
-				  const char *refname,
-				  const struct string_list *extras,
-				  const struct string_list *skip,
-				  unsigned int initial_transaction,
-				  struct strbuf *err)
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err)
 {
-	const char *slash;
-	const char *extra_refname;
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct object_id oid;
-	unsigned int type;
 	int ret = -1;
 
 	/*
@@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
 
 	assert(err);
 
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/*
-		 * Just saying "Is a directory" when we e.g. can't
-		 * lock some multi-level ref isn't very informative,
-		 * the user won't be told *what* is a directory, so
-		 * let's not use strerror() below.
-		 */
-		int ignore_errno;
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+	for (size_t i = 0; i < refnames->nr; i++) {
+		const char *refname = refnames->items[i].string;
+		const char *extra_refname;
+		struct object_id oid;
+		unsigned int type;
+		const char *slash;
+
+		strbuf_reset(&dirname);
+
+		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+			/*
+			 * Just saying "Is a directory" when we e.g. can't
+			 * lock some multi-level ref isn't very informative,
+			 * the user won't be told *what* is a directory, so
+			 * let's not use strerror() below.
+			 */
+			int ignore_errno;
+
+			/* Expand dirname to the new prefix, not including the trailing slash: */
+			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+			/*
+			 * We are still at a leading dir of the refname (e.g.,
+			 * "refs/foo"; if there is a reference with that name,
+			 * it is a conflict, *unless* it is in skip.
+			 */
+			if (skip && string_list_has_string(skip, dirname.buf))
+				continue;
+
+			if (!initial_transaction &&
+			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+					       &type, &ignore_errno)) {
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    dirname.buf, refname);
+				goto cleanup;
+			}
+
+			if (extras && string_list_has_string(extras, dirname.buf)) {
+				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
+					    refname, dirname.buf);
+				goto cleanup;
+			}
+		}
 
 		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
+		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+		 * There is no point in searching for a reference with that
+		 * name, because a refname isn't considered to conflict with
+		 * itself. But we still need to check for references whose
+		 * names are in the "refs/foo/bar/" namespace, because they
+		 * *do* conflict.
 		 */
-		if (skip && string_list_has_string(skip, dirname.buf))
-			continue;
+		strbuf_addstr(&dirname, refname + dirname.len);
+		strbuf_addch(&dirname, '/');
+
+		if (!initial_transaction) {
+			struct ref_iterator *iter;
+			int ok;
+
+			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+						       DO_FOR_EACH_INCLUDE_BROKEN);
+			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+				if (skip &&
+				    string_list_has_string(skip, iter->refname))
+					continue;
+
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    iter->refname, refname);
+				ref_iterator_abort(iter);
+				goto cleanup;
+			}
 
-		if (!initial_transaction &&
-		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
-				       &type, &ignore_errno)) {
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    dirname.buf, refname);
-			goto cleanup;
+			if (ok != ITER_DONE)
+				BUG("error while iterating over references");
 		}
 
-		if (extras && string_list_has_string(extras, dirname.buf)) {
+		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+		if (extra_refname) {
 			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-				    refname, dirname.buf);
+				    refname, extra_refname);
 			goto cleanup;
 		}
 	}
 
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (!initial_transaction) {
-		struct ref_iterator *iter;
-		int ok;
-
-		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-					       DO_FOR_EACH_INCLUDE_BROKEN);
-		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-			if (skip &&
-			    string_list_has_string(skip, iter->refname))
-				continue;
-
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    iter->refname, refname);
-			ref_iterator_abort(iter);
-			goto cleanup;
-		}
-
-		if (ok != ITER_DONE)
-			BUG("error while iterating over references");
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-			    refname, extra_refname);
-	else
-		ret = 0;
+	ret = 0;
 
 cleanup:
 	strbuf_release(&referent);
@@ -2569,6 +2577,23 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
+int refs_verify_refname_available(struct ref_store *refs,
+				  const char *refname,
+				  const struct string_list *extras,
+				  const struct string_list *skip,
+				  unsigned int initial_transaction,
+				  struct strbuf *err)
+{
+	struct string_list_item item = { .string = (char *) refname };
+	struct string_list refnames = {
+		.items = &item,
+		.nr = 1,
+	};
+
+	return refs_verify_refnames_available(refs, &refnames, extras, skip,
+					      initial_transaction, err);
+}
+
 struct do_for_each_reflog_help {
 	each_reflog_fn *fn;
 	void *cb_data;
diff --git a/refs.h b/refs.h
index a0cdd99250e..185aed5a461 100644
--- a/refs.h
+++ b/refs.h
@@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  unsigned int initial_transaction,
 				  struct strbuf *err);
 
+/*
+ * Same as `refs_verify_refname_available()`, but checking for a list of
+ * refnames instead of only a single item. This is more efficient in the case
+ * where one needs to check multiple refnames.
+ */
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err);
+
 int refs_ref_exists(struct ref_store *refs, const char *refname);
 
 int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 05/16] refs/reftable: batch refname availability checks
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as we
essentially still call `refs_verify_refname_available()` in a loop, but
this will change in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index d39a14c5a46..2a90e7cb391 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
 	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct reftable_transaction_data *tx_data = NULL;
 	struct reftable_backend *be;
 	struct object_id head_oid;
@@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * can output a proper error message instead of failing
 			 * at a later point.
 			 */
-			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL,
-							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
-							    err);
-			if (ret < 0)
-				goto done;
+			string_list_append(&refnames_to_check, u->refname);
 
 			/*
 			 * There is no need to write the reference deletion
@@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	string_list_sort(&refnames_to_check);
+	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
+					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+					     err);
+	if (ret < 0)
+		goto done;
+
 	transaction->backend_data = tx_data;
 	transaction->state = REF_TRANSACTION_PREPARED;
 
@@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 	string_list_clear(&affected_refnames, 0);
 	strbuf_release(&referent);
 	strbuf_release(&head_referent);
+	string_list_clear(&refnames_to_check, 0);
 
 	return ret;
 }

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 06/16] refs/files: batch refname availability checks for normal transactions
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:

  1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
     a new reference, mostly to create a nice, user-readable error
     message. This is nothing we have to care about too much, as we only
     hit this code path at most once when we hit a conflict.

  2. `lock_raw_ref()` when it _could_ create the lockfile to check
     whether it is conflicting with any packed refs. In the general case,
     this code path will be hit once for every (successful) reference
     update.

  3. `lock_ref_oid_basic()`, but it is only executed when copying or
     renaming references or when expiring reflogs. It will thus not be
     called in contexts where we have many references queued up.

  4. `refs_refname_ref_available()`, but again only when copying or
     renaming references. It is thus not interesting due to the same
     reason as the previous case.

  5. `files_transaction_finish_initial()`, which is only executed when
     creating a new repository or migrating references.

So out of these, only (2) and (5) are viable candidates to use the
batched checks.

Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.

The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..6ce79cf0791 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -678,6 +678,7 @@ static void unlock_ref(struct ref_lock *lock)
  */
 static int lock_raw_ref(struct files_ref_store *refs,
 			const char *refname, int mustexist,
+			struct string_list *refnames_to_check,
 			const struct string_list *extras,
 			struct ref_lock **lock_p,
 			struct strbuf *referent,
@@ -855,16 +856,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 		}
 
 		/*
-		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing packed ref that
-		 * conflicts with refname:
+		 * If the ref did not exist and we are creating it, we have to
+		 * make sure there is no existing packed ref that conflicts
+		 * with refname. This check is deferred so that we can batch it.
 		 */
-		if (refs_verify_refname_available(
-				    refs->packed_ref_store, refname,
-				    extras, NULL, 0, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto error_return;
-		}
+		string_list_insert(refnames_to_check, refname);
 	}
 
 	ret = 0;
@@ -2569,6 +2565,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			       struct ref_update *update,
 			       struct ref_transaction *transaction,
 			       const char *head_ref,
+			       struct string_list *refnames_to_check,
 			       struct string_list *affected_refnames,
 			       struct strbuf *err)
 {
@@ -2597,7 +2594,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 		lock->count++;
 	} else {
 		ret = lock_raw_ref(refs, update->refname, mustexist,
-				   affected_refnames,
+				   refnames_to_check, affected_refnames,
 				   &lock, &referent,
 				   &update->type, err);
 		if (ret) {
@@ -2811,6 +2808,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	char *head_ref = NULL;
 	int head_type;
 	struct files_transaction_backend_data *backend_data;
@@ -2898,7 +2896,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		struct ref_update *update = transaction->updates[i];
 
 		ret = lock_ref_for_update(refs, update, transaction,
-					  head_ref, &affected_refnames, err);
+					  head_ref, &refnames_to_check,
+					  &affected_refnames, err);
 		if (ret)
 			goto cleanup;
 
@@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	/*
+	 * Verify that none of the loose reference that we're about to write
+	 * conflict with any existing packed references. Ideally, we'd do this
+	 * check after the packed-refs are locked so that the file cannot
+	 * change underneath our feet. But introducing such a lock now would
+	 * probably do more harm than good as users rely on there not being a
+	 * global lock with the "files" backend.
+	 *
+	 * Another alternative would be to do the check after the (optional)
+	 * lock, but that would extend the time we spend in the globally-locked
+	 * state.
+	 *
+	 * So instead, we accept the race for now.
+	 */
+	if (refs_verify_refnames_available(refs->packed_ref_store, &refnames_to_check,
+					   &affected_refnames, NULL, 0, err)) {
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
 	if (packed_transaction) {
 		if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
@@ -2972,6 +2991,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 cleanup:
 	free(head_ref);
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 
 	if (ret)
 		files_transaction_cleanup(refs, transaction);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 07/16] refs/files: batch refname availability checks for initial transactions
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.

Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.

This doesn't yet have an effect on performance as the underlying
logic simply calls This will improve performance when cloning a repository with
many references or when migrating references from any format to the
"files" format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6ce79cf0791..11a620ea11a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct ref_transaction *packed_transaction = NULL;
 	struct ref_transaction *loose_transaction = NULL;
 
@@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		    !is_null_oid(&update->old_oid))
 			BUG("initial ref transaction with old_sha1 set");
 
-		if (refs_verify_refname_available(&refs->base, update->refname,
-						  &affected_refnames, NULL, 1, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto cleanup;
-		}
+		string_list_append(&refnames_to_check, update->refname);
 
 		/*
 		 * packed-refs don't support symbolic refs, root refs and reflogs,
@@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 	}
 
-	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
-	    ref_transaction_commit(packed_transaction, err)) {
+	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
+					   &affected_refnames, NULL, 1, err)) {
+		packed_refs_unlock(refs->packed_ref_store);
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
+	if (ref_transaction_commit(packed_transaction, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
 		goto cleanup;
 	}
@@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		ref_transaction_free(packed_transaction);
 	transaction->state = REF_TRANSACTION_CLOSED;
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 	return ret;
 }
 

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 08/16] refs: stop re-verifying common prefixes for availability
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.

When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.

Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
      Range (min … max):    60.6 ms …  69.5 ms    38 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
      Range (min … max):    38.1 ms …  47.3 ms    61 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

For the "files" backend we see an improvement, but a much smaller one:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     395.8 ms ±   5.3 ms    [User: 63.6 ms, System: 330.5 ms]
      Range (min … max):   387.0 ms … 404.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     386.0 ms ±   4.0 ms    [User: 51.5 ms, System: 332.8 ms]
      Range (min … max):   380.8 ms … 392.6 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     836.6 ms ±   5.6 ms    [User: 645.2 ms, System: 185.2 ms]
      Range (min … max):   829.6 ms … 845.9 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     759.8 ms ±   5.1 ms    [User: 574.9 ms, System: 178.9 ms]
      Range (min … max):   753.1 ms … 768.8 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = HEAD) ran
        1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

And vice versa:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     870.7 ms ±   5.7 ms    [User: 735.2 ms, System: 127.4 ms]
      Range (min … max):   861.6 ms … 883.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     799.1 ms ±   8.5 ms    [User: 661.1 ms, System: 130.2 ms]
      Range (min … max):   787.5 ms … 812.6 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/refs.c b/refs.c
index 5a9b0f2fa1e..eaf41421f50 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct strset dirnames;
 	int ret = -1;
 
 	/*
@@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 	assert(err);
 
+	strset_init(&dirnames);
+
 	for (size_t i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 		const char *extra_refname;
@@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 			if (skip && string_list_has_string(skip, dirname.buf))
 				continue;
 
+			/*
+			 * If we've already seen the directory we don't need to
+			 * process it again. Skip it to avoid checking checking
+			 * common prefixes like "refs/heads/" repeatedly.
+			 */
+			if (!strset_add(&dirnames, dirname.buf))
+				continue;
+
 			if (!initial_transaction &&
 			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 					       &type, &ignore_errno)) {
@@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 cleanup:
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
+	strset_clear(&dirnames);
 	return ret;
 }
 

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 09/16] refs/iterator: separate lifecycle from iteration
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |  2 +
 dir-iterator.c               | 24 +++++------
 dir-iterator.h               | 11 ++---
 iterator.h                   |  2 +-
 refs.c                       |  7 +++-
 refs/debug.c                 |  9 ++---
 refs/files-backend.c         | 36 +++++------------
 refs/iterator.c              | 95 ++++++++++++++------------------------------
 refs/packed-backend.c        | 27 ++++++-------
 refs/ref-cache.c             |  9 ++---
 refs/refs-internal.h         | 29 +++++---------
 refs/reftable-backend.c      | 34 ++++------------
 t/helper/test-dir-iterator.c |  1 +
 13 files changed, 100 insertions(+), 186 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..ccd6a197343 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -28,7 +28,7 @@
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -39,6 +39,7 @@
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_free(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +108,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/iterator.h b/iterator.h
index 0f6900e43ad..6b77dcc2626 100644
--- a/iterator.h
+++ b/iterator.h
@@ -12,7 +12,7 @@
 #define ITER_OK 0
 
 /*
- * The iterator is exhausted and has been freed.
+ * The iterator is exhausted.
  */
 #define ITER_DONE -1
 
diff --git a/refs.c b/refs.c
index eaf41421f50..8eff60a2186 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 11a620ea11a..859f1c11941 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -915,10 +915,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -931,23 +927,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1378,7 +1368,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1386,6 +1376,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1452,6 +1443,7 @@ static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2299,9 +2291,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2311,23 +2300,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..aaeff270437 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +365,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +377,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +424,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..74e2c03cef1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -293,7 +293,7 @@ enum do_for_each_ref_flags {
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -307,6 +307,7 @@ enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +334,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +435,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +453,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2a90e7cb391..06543f79c64 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2017,17 +2008,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2038,21 +2022,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.

Introduce a new `.seek` function in the ref iterator vtable that allows
callers to seek an iterator multiple times. It is expected to be
functionally the same as calling `refs_ref_iterator_begin()` with a
different (or the same) prefix.

Note that it is not possible to adjust parameters other than the seeked
prefix for now, so exclude patterns, trimmed prefixes and flags will
remain unchanged. We do not have a usecase for changing these parameters
right now, but if we ever find one we can adapt accordingly.

Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/debug.c         | 11 +++++++++++
 refs/iterator.c      | 24 ++++++++++++++++++++++++
 refs/refs-internal.h | 24 ++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/refs/debug.c b/refs/debug.c
index a9786da4ba1..5390fa9c187 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -169,6 +169,16 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return res;
 }
 
+static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->seek(diter->iter, prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	return res;
+}
+
 static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -189,6 +199,7 @@ static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
+	.seek = debug_ref_iterator_seek,
 	.peel = debug_ref_iterator_peel,
 	.release = debug_ref_iterator_release,
 };
diff --git a/refs/iterator.c b/refs/iterator.c
index aaeff270437..757b105261a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,6 +15,12 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix)
+{
+	return ref_iterator->vtable->seek(ref_iterator, prefix);
+}
+
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled)
 {
@@ -50,6 +56,12 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 	return ITER_DONE;
 }
 
+static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				   const char *prefix UNUSED)
+{
+	return 0;
+}
+
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				   struct object_id *peeled UNUSED)
 {
@@ -62,6 +74,7 @@ static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
+	.seek = empty_ref_iterator_seek,
 	.peel = empty_ref_iterator_peel,
 	.release = empty_ref_iterator_release,
 };
@@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				    struct object_id *peeled)
 {
@@ -387,6 +410,7 @@ static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
+	.seek = prefix_ref_iterator_seek,
 	.peel = prefix_ref_iterator_peel,
 	.release = prefix_ref_iterator_release,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 74e2c03cef1..8f18274a165 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -327,6 +327,22 @@ struct ref_iterator {
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
 /*
  * If possible, peel the reference currently being viewed by the
  * iterator. Return 0 on success.
@@ -445,6 +461,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference matching the given prefix. Should
+ * behave the same as if a new iterator was created with the same prefix.
+ */
+typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
+				 const char *prefix);
+
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
  */
@@ -459,6 +482,7 @@ typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
+	ref_iterator_seek_fn *seek;
 	ref_iterator_peel_fn *peel;
 	ref_iterator_release_fn *release;
 };

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 11/16] refs/iterator: implement seeking for merged iterators
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index 757b105261a..63608ef9907 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
 struct merge_ref_iterator {
 	struct ref_iterator base;
 
-	struct ref_iterator *iter0, *iter1;
+	struct ref_iterator *iter0, *iter0_owned;
+	struct ref_iterator *iter1, *iter1_owned;
 
 	ref_iterator_select_fn *select;
 	void *cb_data;
@@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
-			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
-			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -177,7 +176,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
-			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -206,7 +204,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
-				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -226,6 +223,28 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_ERROR;
 }
 
+static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ret;
+
+	iter->current = NULL;
+	iter->iter0 = iter->iter0_owned;
+	iter->iter1 = iter->iter1_owned;
+
+	ret = ref_iterator_seek(iter->iter0, prefix);
+	if (ret < 0)
+		return ret;
+
+	ret = ref_iterator_seek(iter->iter1, prefix);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	ref_iterator_free(iter->iter0);
-	ref_iterator_free(iter->iter1);
+	ref_iterator_free(iter->iter0_owned);
+	ref_iterator_free(iter->iter1_owned);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
+	.seek = merge_ref_iterator_seek,
 	.peel = merge_ref_iterator_peel,
 	.release = merge_ref_iterator_release,
 };
@@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 */
 
 	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
-	iter->iter0 = iter0;
-	iter->iter1 = iter1;
+	iter->iter0 = iter->iter0_owned = iter0;
+	iter->iter1 = iter->iter1_owned = iter1;
 	iter->select = select;
 	iter->cb_data = cb_data;
 	iter->current = NULL;

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 12/16] refs/iterator: implement seeking for reftable iterators
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:55   ` [PATCH v3 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:

  - We do not support seeking on reflog iterators. It is unclear what
    seeking would even look like in this context, as you typically would
    want to seek to a specific entry in the reflog for a specific ref.
    There is not currently a usecase for this, but if there ever is we
    can implement seeking in the future.

  - We start to check whether `reftable_stack_init_ref_iterator()` is
    successful.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 06543f79c64..b0c09f34433 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -547,7 +547,7 @@ struct reftable_ref_iterator {
 	struct reftable_ref_record ref;
 	struct object_id oid;
 
-	const char *prefix;
+	char *prefix;
 	size_t prefix_len;
 	char **exclude_patterns;
 	size_t exclude_patterns_index;
@@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				      const char *prefix)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->prefix_len = prefix ? strlen(prefix) : 0;
+	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+
+	return iter->err;
+}
+
 static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				      struct object_id *peeled)
 {
@@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
+	free(iter->prefix);
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
+	.seek = reftable_ref_iterator_seek,
 	.peel = reftable_ref_iterator_peel,
 	.release = reftable_ref_iterator_release,
 };
@@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
-	iter->prefix = prefix;
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
 	iter->refs = refs;
@@ -821,8 +835,11 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	reftable_stack_init_ref_iterator(stack, &iter->iter);
-	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
+	ret = reftable_stack_init_ref_iterator(stack, &iter->iter);
+	if (ret)
+		goto done;
+
+	ret = reftable_ref_iterator_seek(&iter->base, prefix);
 	if (ret)
 		goto done;
 
@@ -2015,6 +2032,13 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+					 const char *prefix UNUSED)
+{
+	BUG("reftable reflog iterator cannot be seeked");
+	return -1;
+}
+
 static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 					 struct object_id *peeled UNUSED)
 {
@@ -2033,6 +2057,7 @@ static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
+	.seek = reftable_reflog_iterator_seek,
 	.peel = reftable_reflog_iterator_peel,
 	.release = reftable_reflog_iterator_release,
 };

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-02-25  8:55   ` Patrick Steinhardt
  2025-02-25  8:56   ` [PATCH v3 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:55 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.

Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance concern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/ref-cache.c | 79 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 6457e02c1ea..c1f1bab1d50 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -362,9 +362,7 @@ struct cache_ref_iterator {
 	struct ref_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack.
 	 */
 	size_t levels_nr;
 
@@ -376,7 +374,7 @@ struct cache_ref_iterator {
 	 * The prefix is matched textually, without regard for path
 	 * component boundaries.
 	 */
-	const char *prefix;
+	char *prefix;
 
 	/*
 	 * A stack of levels. levels[0] is the uppermost level that is
@@ -389,6 +387,9 @@ struct cache_ref_iterator {
 	struct cache_ref_iterator_level *levels;
 
 	struct repository *repo;
+	struct ref_cache *cache;
+
+	int prime_dir;
 };
 
 static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
@@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
 
+	if (!iter->levels_nr)
+		return ITER_DONE;
+
 	while (1) {
 		struct cache_ref_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -444,6 +448,41 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct cache_ref_iterator_level *level;
+	struct ref_dir *dir;
+
+	dir = get_ref_dir(iter->cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix);
+	if (!dir) {
+		iter->levels_nr = 0;
+		return 0;
+	}
+
+	if (iter->prime_dir)
+		prime_ref_dir(dir, prefix);
+	iter->levels_nr = 1;
+	level = &iter->levels[0];
+	level->index = -1;
+	level->dir = dir;
+
+	if (prefix && *prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup(prefix);
+		level->prefix_state = PREFIX_WITHIN_DIR;
+	} else {
+		FREE_AND_NULL(iter->prefix);
+		level->prefix_state = PREFIX_CONTAINS_DIR;
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -456,12 +495,13 @@ static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-	free((char *)iter->prefix);
+	free(iter->prefix);
 	free(iter->levels);
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
+	.seek = cache_ref_iterator_seek,
 	.peel = cache_ref_iterator_peel,
 	.release = cache_ref_iterator_release,
 };
@@ -471,39 +511,22 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      struct repository *repo,
 					      int prime_dir)
 {
-	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	dir = get_ref_dir(cache->root);
-	if (prefix && *prefix)
-		dir = find_containing_dir(dir, prefix);
-	if (!dir)
-		/* There's nothing to iterate over. */
-		return empty_ref_iterator_begin();
-
-	if (prime_dir)
-		prime_ref_dir(dir, prefix);
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
+	iter->repo = repo;
+	iter->cache = cache;
+	iter->prime_dir = prime_dir;
 
-	if (prefix && *prefix) {
-		iter->prefix = xstrdup(prefix);
-		level->prefix_state = PREFIX_WITHIN_DIR;
-	} else {
-		level->prefix_state = PREFIX_CONTAINS_DIR;
+	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
 	}
 
-	iter->repo = repo;
-
 	return ref_iterator;
 }

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 14/16] refs/iterator: implement seeking for packed-ref iterators
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (12 preceding siblings ...)
  2025-02-25  8:55   ` [PATCH v3 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-02-25  8:56   ` Patrick Steinhardt
  2025-02-25  8:56   ` [PATCH v3 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
  2025-02-25  8:56   ` [PATCH v3 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 65 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 38a1956d1a8..f4c82ba2c7d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -819,6 +819,8 @@ struct packed_ref_iterator {
 
 	struct snapshot *snapshot;
 
+	char *prefix;
+
 	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
@@ -841,11 +843,9 @@ struct packed_ref_iterator {
 };
 
 /*
- * Move the iterator to the next record in the snapshot, without
- * respect for whether the record is actually required by the current
- * iteration. Adjust the fields in `iter` and return `ITER_OK` or
- * `ITER_DONE`. This function does not free the iterator in the case
- * of `ITER_DONE`.
+ * Move the iterator to the next record in the snapshot. Adjust the fields in
+ * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
+ * iterator in the case of `ITER_DONE`.
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
@@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = next_record(iter)) == ITER_OK) {
+		const char *refname = iter->base.refname;
+		const char *prefix = iter->prefix;
+
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
 		    !is_per_worktree_ref(iter->base.refname))
 			continue;
@@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 					    &iter->oid, iter->flags))
 			continue;
 
+		while (prefix && *prefix) {
+			if (*refname < *prefix)
+				BUG("packed-refs backend yielded reference preceding its prefix");
+			else if (*refname > *prefix)
+				return ITER_DONE;
+			prefix++;
+			refname++;
+		}
+
 		return ITER_OK;
 	}
 
 	return ok;
 }
 
+static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct packed_ref_iterator *iter =
+		(struct packed_ref_iterator *)ref_iterator;
+	const char *start;
+
+	if (prefix && *prefix)
+		start = find_reference_location(iter->snapshot, prefix, 0);
+	else
+		start = iter->snapshot->start;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->pos = start;
+	iter->eof = iter->snapshot->eof;
+
+	return 0;
+}
+
 static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
+	free(iter->prefix);
 	release_snapshot(iter->snapshot);
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
+	.seek = packed_ref_iterator_seek,
 	.peel = packed_ref_iterator_peel,
 	.release = packed_ref_iterator_release,
 };
@@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
-	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
-	if (start == snapshot->eof)
-		return empty_ref_iterator_begin();
-
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
@@ -1130,19 +1155,15 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
-
-	iter->pos = start;
-	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-
 	iter->base.oid = &iter->oid;
-
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (prefix && *prefix)
-		/* Stop iteration after we've gone *past* prefix: */
-		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
+	}
 
 	return ref_iterator;
 }

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 15/16] refs/iterator: implement seeking for files iterators
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (13 preceding siblings ...)
  2025-02-25  8:56   ` [PATCH v3 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
@ 2025-02-25  8:56   ` Patrick Steinhardt
  2025-02-25  8:56   ` [PATCH v3 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 859f1c11941..4e1c50fead3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -918,6 +918,14 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -936,6 +944,7 @@ static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
+	.seek = files_ref_iterator_seek,
 	.peel = files_ref_iterator_peel,
 	.release = files_ref_iterator_release,
 };
@@ -2294,6 +2303,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				      const char *prefix UNUSED)
+{
+	BUG("ref_iterator_seek() called for reflog_iterator");
+}
+
 static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				      struct object_id *peeled UNUSED)
 {
@@ -2309,6 +2324,7 @@ static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
+	.seek = files_reflog_iterator_seek,
 	.peel = files_reflog_iterator_peel,
 	.release = files_reflog_iterator_release,
 };

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v3 16/16] refs: reuse iterators when determining refname availability
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
                     ` (14 preceding siblings ...)
  2025-02-25  8:56   ` [PATCH v3 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
@ 2025-02-25  8:56   ` Patrick Steinhardt
  15 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-25  8:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

The "files" backend doesn't really show a huge impact:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
      Range (min … max):   384.6 ms … 404.5 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
      Range (min … max):   377.0 ms … 397.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 8eff60a2186..6cbb9decdb0 100644
--- a/refs.c
+++ b/refs.c
@@ -2555,8 +2555,13 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		if (!initial_transaction) {
 			int ok;
 
-			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-						       DO_FOR_EACH_INCLUDE_BROKEN);
+			if (!iter) {
+				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+							       DO_FOR_EACH_INCLUDE_BROKEN);
+			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+				goto cleanup;
+			}
+
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 				    string_list_has_string(skip, iter->refname))
@@ -2569,9 +2574,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
-
-			ref_iterator_free(iter);
-			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);

-- 
2.48.1.683.gf705b3209c.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators
  2025-02-25  7:39       ` Patrick Steinhardt
@ 2025-02-25 12:07         ` shejialuo
  0 siblings, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-02-25 12:07 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Tue, Feb 25, 2025 at 08:39:54AM +0100, Patrick Steinhardt wrote:
> On Mon, Feb 24, 2025 at 11:09:32PM +0800, shejialuo wrote:
> > On Wed, Feb 19, 2025 at 02:23:41PM +0100, Patrick Steinhardt wrote:
> > > Implement seeking of `packed-ref` iterators. The implementation is again
> > > straight forward, except that we cannot continue to use the prefix
> > > iterator as we would otherwise not be able to reseek the iterator
> > > anymore in case one first asks for an empty and then for a non-empty
> > > prefix. Instead, we open-code the logic to in `advance()`.
> > > 
> > > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > > ---
> > >  refs/packed-backend.c | 62 +++++++++++++++++++++++++++++++++------------------
> > >  1 file changed, 40 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > > index 38a1956d1a8..71a38acfedc 100644
> > > --- a/refs/packed-backend.c
> > > +++ b/refs/packed-backend.c
> > > @@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > >  					    &iter->oid, iter->flags))
> > >  			continue;
> > >  
> > > +		while (prefix && *prefix) {
> > > +			if (*refname < *prefix)
> > > +				BUG("packed-refs backend yielded reference preceding its prefix");
> > > +			else if (*refname > *prefix)
> > > +				return ITER_DONE;
> > > +			prefix++;
> > > +			refname++;
> > > +		}
> > 
> > Although I cannot understand the code, I want to ask a question here, we
> > we need to do this in `advance`? Should we check this for
> > `packed_ref_iterator_seek` or in the `next_record` function?
> > 
> > Before we introduce `seek`, we don't need this logic. I somehow think we
> > should do this in `packed_ref_iterator_seek`.
> 
> We cannot do this in `packed_ref_iterator_seek()` because we need to do
> it for every single record that we yield from the iterator. We _could_
> do it in `next_record()`, but that function is rather complex already
> and really only cares about yielding the next record. On the other hand,
> `advance()` already knows to skip certain entries, so putting the logic
> in there to also handle termination feels like a natural fit to me.
> 

Thanks for the detailed explanation.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-25  8:55   ` [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-26 22:26     ` Junio C Hamano
  2025-02-27 11:57       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: Junio C Hamano @ 2025-02-26 22:26 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, shejialuo,
	Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> Most of the commands in git-update-ref(1) accept an old and/or new
> object ID to update a specific reference to. These object IDs get parsed
> via `repo_get_oid()`, which not only handles plain object IDs, but also
> those that have a suffix like "~" or "^2". More surprisingly though, it
> even knows to resolve references, despite the fact that its manpage does
> not mention this fact even once.

Are you referring to <new-oid> and other placeholders with "oid" in
their names?  I do think "oid" in our documentation implies that
only full hexadecimal object names are allowed.  The glossary agrees
by saying that <object id> is a synonym for <object name> that is
usually 40-hex SHA-1.  However, that is not strictly enforced and we
say <object> (or its typed variants like <commit-ish>) even when a
command takes any extended SHA-1 expression, as described in
Documentation/revisions.{txt,adoc}, not limited to full hexadecimal
object name.

So, I am somewhat sympathetic to your confusion, but not that much.
When we wrote the command and documented it back in 2005, we did
mean to take any object name that is spelled in any way, not just
full hexadecimal.  You may want to update the manual to emphasize
that we encourage the use of full hexadecimal for this command and
elsewhere where it is more appropriate.

> One consequence of this is that we also check for ambiguous references:
> when parsing a full object ID where the DWIM mechanism would also cause
> us to resolve it as a branch, we'd end up printing a warning. While this
> check makes sense to have in general, it is arguably less useful in the
> context of git-update-ref(1).
>
>   - The manpage is explicitly structured around object IDs. So if we see
>     a fully blown object ID, the intent should be quite clear in
>     general.
>
>   - The command is part of our plumbing layer and not a tool that users
>     would generally use in interactive workflows. As such, the warning
>     will likely not be visible to anybody in the first place.

In addition, if the user meant to refer to a ref, it is possible to
disambiguate by prefixing refs/tags/ or whatever.  So squelching the
warning unconditionally might make sense.  We will yield the value
of the full hexadecimal object name, instead of the value of the ref
that is confusingly named, so there is no material change in the
behaviour here.

OK.



^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-26 22:26     ` Junio C Hamano
@ 2025-02-27 11:57       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-27 11:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, shejialuo,
	Christian Couder

On Wed, Feb 26, 2025 at 02:26:59PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Most of the commands in git-update-ref(1) accept an old and/or new
> > object ID to update a specific reference to. These object IDs get parsed
> > via `repo_get_oid()`, which not only handles plain object IDs, but also
> > those that have a suffix like "~" or "^2". More surprisingly though, it
> > even knows to resolve references, despite the fact that its manpage does
> > not mention this fact even once.
> 
> Are you referring to <new-oid> and other placeholders with "oid" in
> their names?  I do think "oid" in our documentation implies that
> only full hexadecimal object names are allowed.  The glossary agrees
> by saying that <object id> is a synonym for <object name> that is
> usually 40-hex SHA-1.  However, that is not strictly enforced and we
> say <object> (or its typed variants like <commit-ish>) even when a
> command takes any extended SHA-1 expression, as described in
> Documentation/revisions.{txt,adoc}, not limited to full hexadecimal
> object name.
> 
> So, I am somewhat sympathetic to your confusion, but not that much.
> When we wrote the command and documented it back in 2005, we did
> mean to take any object name that is spelled in any way, not just
> full hexadecimal.  You may want to update the manual to emphasize
> that we encourage the use of full hexadecimal for this command and
> elsewhere where it is more appropriate.

Yeah. I have been aware of the behaviour beforehand, but an unsuspecting
user that reads through the manpage wouldn't be able to figure out at
all that this is the case. I guess this is something we should improve,
but I think it's outside of the scope of this series. #leftoverbits

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v4 00/16] refs: batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (16 preceding siblings ...)
  2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
@ 2025-02-28  9:26 ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
                     ` (16 more replies)
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
  19 siblings, 17 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Hi,

this patch series has been inspired by brian's report that the reftable
backend is significantly slower when writing many references compared to
the files backend. As explained in that thread, the underlying issue is
the design of tombstone references: when we first delete all references
in a repository and then recreate them, we still have all the tombstones
and thus we need to churn through all of them to figure out that they
have been deleted in the first place. The files backend does not have
this issue.

I consider the benchmark itself to be kind of broken, as it stems from
us deleting all refs and then recreating them. And if you pack refs in
between then the "reftable" backend outperforms the "files" backend.

But there are a couple of opportunities here anyway. While we cannot
make the underlying issue of tombstones being less efficient go away,
this has prompted me to have a deeper look at where we spend all the
time. There are three ideas in this series:

  - git-update-ref(1) performs ambiguity checks for any full-size object
    ID, which triggers a lot of reads. This is somewhat pointless though
    given that the manpage explicitly points out that the command is
    about object IDs, even though it does know to parse refs. But being
    part of plumbing, emitting the warning here does not make a ton of
    sense, and favoring object IDs over references in these cases is the
    obvious thing to do anyway.

  - For each ref "refs/heads/bar", we need to verify that neither
    "refs/heads" nor "refs" exists. This was repeated for every refname,
    but because most refnames use common prefixes this made us re-check
    a lot of prefixes. This is addressed by using a `strset` of already
    checked prefixes.

  - For each ref "refs/heads/bar", we need to verify that no ref
    "refs/heads/bar/*" exists. We always created a new ref iterator for
    this check, which requires us to discard all internal state and then
    recreate it. The reftable library has already been refactored though
    to have reseekable iterators, so we backfill this functionality to
    all the other iterators and then reuse the iterator.

With the (somewhat broken) benchmark we see a small speedup with the
"files" backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
      Range (min … max):   232.2 ms … 236.9 ms    10 runs

    Benchmark 2: update-ref (refformat = files, revision = HEAD)
      Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
      Range (min … max):   181.1 ms … 187.0 ms    10 runs

    Summary
      update-ref (refformat = files, revision = HEAD) ran
        1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)

And a huge speedup with the "reftable" backend:

    Benchmark 1: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
      Range (min … max):   16.785 s … 16.982 s    10 runs

    Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
      Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
      Range (min … max):    2.215 s …  2.244 s    10 runs

    Summary
      update-ref (refformat = reftable, revision = HEAD) ran
        7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)

We're still not up to speed with the "files" backend, but considerably
better. Given that this is an extreme edge case and not reflective of
the general case I'm okay with this result for now.

But more importantly, this refactoring also has a positive effect when
updating references in a repository with preexisting refs, which I
consider to be the more realistic scenario. The following benchmark
creates 10k refs with 100k preexisting refs.

With the "files" backend we see a modest improvement:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
      Range (min … max):   465.4 ms … 496.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
      Range (min … max):   376.5 ms … 403.1 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)

But with the "reftable" backend we see an almost 5x improvement, where
it's now ~15x faster than the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
      Range (min … max):   150.5 ms … 158.4 ms    18 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
      Range (min … max):    29.8 ms …  38.6 ms    71 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

The series is structured as follows:

  - Patches 1 to 4 implement the logic to skip ambiguity checks in
    git-update-ref(1).

  - Patch 5 to 8 introduce batched checks.

  - Patch 9 deduplicates the ref prefix checks.

  - Patch 10 to 16 implement the infrastructure to reseek iterators.

  - Patch 17 starts to reuse iterators for nested ref checks.

Changes in v2:
  - Point out why we also have to touch up the `dir_iterator`.
  - Fix up the comment explaining `ITER_DONE`.
  - Fix up comments that show usage patterns of the ref and dir iterator
    interfaces.
  - Start batching availability checks in the "files" backend, as well.
  - Improve the commit message that drops the ambiguity check so that we
    also point to 25fba78d36b (cat-file: disable object/refname
    ambiguity check for batch mode, 2013-07-12).
  - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im

Changes in v3:
  - Fix one case where we didn't skip ambiguity checks in
    git-update-ref(1).
  - Document better that only the prefix can change on reseeking
    iterators. Other internal state will remain the same.
  - Fix a memory leak in the ref-cache iterator.
  - Don't ignore errors returned by `packed_ref_iterator_seek()`.
  - Link to v2: https://lore.kernel.org/r/20250219-pks-update-ref-optimization-v2-0-e696e7220b22@pks.im

Changes in v4:
  - A couple of clarifications in the commit message that disabled
    ambiguity warnings.
  - Link to v3: https://lore.kernel.org/r/20250225-pks-update-ref-optimization-v3-0-77c3687cda75@pks.im

Thanks!

Patrick

[1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

---
Patrick Steinhardt (16):
      object-name: introduce `repo_get_oid_with_flags()`
      object-name: allow skipping ambiguity checks in `get_oid()` family
      builtin/update-ref: skip ambiguity checks when parsing object IDs
      refs: introduce function to batch refname availability checks
      refs/reftable: batch refname availability checks
      refs/files: batch refname availability checks for normal transactions
      refs/files: batch refname availability checks for initial transactions
      refs: stop re-verifying common prefixes for availability
      refs/iterator: separate lifecycle from iteration
      refs/iterator: provide infrastructure to re-seek iterators
      refs/iterator: implement seeking for merged iterators
      refs/iterator: implement seeking for reftable iterators
      refs/iterator: implement seeking for ref-cache iterators
      refs/iterator: implement seeking for packed-ref iterators
      refs/iterator: implement seeking for files iterators
      refs: reuse iterators when determining refname availability

 builtin/clone.c              |   2 +
 builtin/update-ref.c         |  15 ++--
 dir-iterator.c               |  24 +++---
 dir-iterator.h               |  11 +--
 hash.h                       |   1 +
 iterator.h                   |   2 +-
 object-name.c                |  18 +++--
 object-name.h                |   6 ++
 refs.c                       | 186 ++++++++++++++++++++++++++-----------------
 refs.h                       |  12 +++
 refs/debug.c                 |  20 +++--
 refs/files-backend.c         | 117 +++++++++++++++++----------
 refs/iterator.c              | 145 +++++++++++++++++----------------
 refs/packed-backend.c        |  92 ++++++++++++---------
 refs/ref-cache.c             |  88 ++++++++++++--------
 refs/refs-internal.h         |  53 +++++++-----
 refs/reftable-backend.c      |  85 +++++++++++---------
 t/helper/test-dir-iterator.c |   1 +
 18 files changed, 528 insertions(+), 350 deletions(-)

Range-diff versus v3:

 1:  9fa53bcfcc4 =  1:  9b4f39fb07f object-name: introduce `repo_get_oid_with_flags()`
 2:  5f5d7fe8f2f =  2:  dbd0e3d3da5 object-name: allow skipping ambiguity checks in `get_oid()` family
 3:  0feb7829db9 !  3:  63587d1c6ee builtin/update-ref: skip ambiguity checks when parsing object IDs
    @@ Commit message
         object ID to update a specific reference to. These object IDs get parsed
         via `repo_get_oid()`, which not only handles plain object IDs, but also
         those that have a suffix like "~" or "^2". More surprisingly though, it
    -    even knows to resolve references, despite the fact that its manpage does
    -    not mention this fact even once.
    +    even knows to resolve arbitrary revisions, despite the fact that its
    +    manpage does not mention this fact even once.
     
         One consequence of this is that we also check for ambiguous references:
         when parsing a full object ID where the DWIM mechanism would also cause
         us to resolve it as a branch, we'd end up printing a warning. While this
         check makes sense to have in general, it is arguably less useful in the
    -    context of git-update-ref(1). This is out of two reasons:
    +    context of git-update-ref(1). This is due to multiple reasons:
     
           - The manpage is explicitly structured around object IDs. So if we see
             a fully blown object ID, the intent should be quite clear in
    @@ Commit message
             would generally use in interactive workflows. As such, the warning
             will likely not be visible to anybody in the first place.
     
    +      - Users can and should use the fully-qualified refname in case there
    +        is any potential for ambiguity. And given that this command is part
    +        of our plumbing layer, one should always try to be as defensive as
    +        possible and use fully-qualified refnames.
    +
         Furthermore, this check can be quite expensive when updating lots of
         references via `--stdin`, because we try to read multiple references per
         object ID that we parse according to the DWIM rules. This effect can be
    @@ Commit message
         batch mode, 2013-07-12).
     
         Disable the warning in git-update-ref(1), which provides a significant
    -    speedup with both backends. The following benchmark creates 10000 new
    -    references with a 100000 preexisting refs with the "files" backend:
    +    speedup with both backends. The user-visible outcome is unchanged even
    +    when ambiguity exists, except that we don't show the warning anymore.
    +
    +    The following benchmark creates 10000 new references with a 100000
    +    preexisting refs with the "files" backend:
     
             Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
               Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
 4:  4202541a160 =  4:  0f964bfc3af refs: introduce function to batch refname availability checks
 5:  e1ff37d766b =  5:  59c5f21a17d refs/reftable: batch refname availability checks
 6:  90363250575 =  6:  395d008b3c8 refs/files: batch refname availability checks for normal transactions
 7:  a019888cd65 =  7:  a8e33c64ece refs/files: batch refname availability checks for initial transactions
 8:  dab106f4530 =  8:  15b3cdae4ee refs: stop re-verifying common prefixes for availability
 9:  dc03794e974 =  9:  0f6901fe637 refs/iterator: separate lifecycle from iteration
10:  c3eb4ecb35f = 10:  c4a9ae51590 refs/iterator: provide infrastructure to re-seek iterators
11:  a0b626e3fd6 = 11:  d9398eb9f48 refs/iterator: implement seeking for merged iterators
12:  e0fe9998d7f = 12:  b653eb7f663 refs/iterator: implement seeking for reftable iterators
13:  84f6d1fc512 = 13:  6a616d0e328 refs/iterator: implement seeking for ref-cache iterators
14:  4ccc5a7c6e3 = 14:  747188e9b81 refs/iterator: implement seeking for packed-ref iterators
15:  7b25a9033d9 = 15:  ebda415ae47 refs/iterator: implement seeking for files iterators
16:  9688d6bc5cf = 16:  80e50011c28 refs: reuse iterators when determining refname availability

---
base-commit: e2067b49ecaef9b7f51a17ce251f9207f72ef52d
change-id: 20250217-pks-update-ref-optimization-15c795e66e2b



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v4 01/16] object-name: introduce `repo_get_oid_with_flags()`
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.

This function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 14 ++++++++------
 object-name.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 945d5bdef25..233f3f861e3 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
 	strbuf_release(&ctx->symlink_path);
 }
 
-/*
- * This is like "get_oid_basic()", except it allows "object ID expressions",
- * notably "xyz^" for "parent of xyz"
- */
-int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+int repo_get_oid_with_flags(struct repository *r, const char *name,
+			    struct object_id *oid, unsigned flags)
 {
 	struct object_context unused;
-	int ret = get_oid_with_context(r, name, 0, oid, &unused);
+	int ret = get_oid_with_context(r, name, flags, oid, &unused);
 	object_context_release(&unused);
 	return ret;
 }
 
+int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+{
+	return repo_get_oid_with_flags(r, name, oid, 0);
+}
+
 /*
  * This returns a non-zero value if the string (built using printf
  * format and the given arguments) is not a valid object.
diff --git a/object-name.h b/object-name.h
index 8dba4a47a47..cda4934cd5f 100644
--- a/object-name.h
+++ b/object-name.h
@@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
 void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 			      int abbrev_len);
 
+/*
+ * This is like "get_oid_basic()", except it allows "object ID expressions",
+ * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
+ */
+int repo_get_oid_with_flags(struct repository *r, const char *str,
+			    struct object_id *oid, unsigned flags);
 int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
 __attribute__((format (printf, 2, 3)))
 int get_oidf(struct object_id *oid, const char *fmt, ...);

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-03-06 13:21     ` Karthik Nayak
  2025-02-28  9:26   ` [PATCH v4 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
                     ` (14 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.

Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 hash.h        | 1 +
 object-name.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hash.h b/hash.h
index 4367acfec50..79419016513 100644
--- a/hash.h
+++ b/hash.h
@@ -204,6 +204,7 @@ struct object_id {
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
 #define GET_OID_HASH_ANY      020000
+#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 233f3f861e3..85444dbb15b 100644
--- a/object-name.c
+++ b/object-name.c
@@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	int fatal = !(flags & GET_OID_QUIETLY);
 
 	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
-		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
+		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
+		    repo_settings_get_warn_ambiguous_refs(r) &&
+		    warn_on_object_refname_ambiguity) {
 			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 			if (refs_found > 0) {
 				warning(warn_msg, len, str);

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.

One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:

  - The manpage is explicitly structured around object IDs. So if we see
    a fully blown object ID, the intent should be quite clear in
    general.

  - The command is part of our plumbing layer and not a tool that users
    would generally use in interactive workflows. As such, the warning
    will likely not be visible to anybody in the first place.

  - Users can and should use the fully-qualified refname in case there
    is any potential for ambiguity. And given that this command is part
    of our plumbing layer, one should always try to be as defensive as
    possible and use fully-qualified refnames.

Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.

The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).

Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.

The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
      Range (min … max):   461.9 ms … 479.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
      Range (min … max):   384.9 ms … 405.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

And with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
      Range (min … max):   142.7 ms … 150.8 ms    19 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
      Range (min … max):    61.1 ms …  66.6 ms    41 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/update-ref.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4d35bdc4b4b..1d541e13ade 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
 		(*next)++;
 		*next = parse_arg(*next, &arg);
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else {
 			/* Without -z, an empty value means all zeros: */
@@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
 		*next += arg.len;
 
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
 			/* With -z, treat an empty value as all zeros: */
@@ -299,7 +301,8 @@ static void parse_cmd_symref_update(struct ref_transaction *transaction,
 			die("symref-update %s: expected old value", refname);
 
 		if (!strcmp(old_arg, "oid")) {
-			if (repo_get_oid(the_repository, old_target, &old_oid))
+			if (repo_get_oid_with_flags(the_repository, old_target, &old_oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				die("symref-update %s: invalid oid: %s", refname, old_target);
 
 			have_old_oid = 1;
@@ -772,7 +775,8 @@ int cmd_update_ref(int argc,
 		refname = argv[0];
 		value = argv[1];
 		oldval = argv[2];
-		if (repo_get_oid(the_repository, value, &oid))
+		if (repo_get_oid_with_flags(the_repository, value, &oid,
+					    GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid SHA1", value);
 	}
 
@@ -783,7 +787,8 @@ int cmd_update_ref(int argc,
 			 * must not already exist:
 			 */
 			oidclr(&oldoid, the_repository->hash_algo);
-		else if (repo_get_oid(the_repository, oldval, &oldoid))
+		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
+						 GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid old SHA1", oldval);
 	}
 

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 04/16] refs: introduce function to batch refname availability checks
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-03-06 13:47     ` Karthik Nayak
  2025-02-28  9:26   ` [PATCH v4 05/16] refs/reftable: " Patrick Steinhardt
                     ` (12 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.

Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.

The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 169 +++++++++++++++++++++++++++++++++++++----------------------------
 refs.h |  12 +++++
 2 files changed, 109 insertions(+), 72 deletions(-)

diff --git a/refs.c b/refs.c
index f4094a326a9..5a9b0f2fa1e 100644
--- a/refs.c
+++ b/refs.c
@@ -2467,19 +2467,15 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 	return ret;
 }
 
-int refs_verify_refname_available(struct ref_store *refs,
-				  const char *refname,
-				  const struct string_list *extras,
-				  const struct string_list *skip,
-				  unsigned int initial_transaction,
-				  struct strbuf *err)
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err)
 {
-	const char *slash;
-	const char *extra_refname;
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct object_id oid;
-	unsigned int type;
 	int ret = -1;
 
 	/*
@@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
 
 	assert(err);
 
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/*
-		 * Just saying "Is a directory" when we e.g. can't
-		 * lock some multi-level ref isn't very informative,
-		 * the user won't be told *what* is a directory, so
-		 * let's not use strerror() below.
-		 */
-		int ignore_errno;
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+	for (size_t i = 0; i < refnames->nr; i++) {
+		const char *refname = refnames->items[i].string;
+		const char *extra_refname;
+		struct object_id oid;
+		unsigned int type;
+		const char *slash;
+
+		strbuf_reset(&dirname);
+
+		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+			/*
+			 * Just saying "Is a directory" when we e.g. can't
+			 * lock some multi-level ref isn't very informative,
+			 * the user won't be told *what* is a directory, so
+			 * let's not use strerror() below.
+			 */
+			int ignore_errno;
+
+			/* Expand dirname to the new prefix, not including the trailing slash: */
+			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+			/*
+			 * We are still at a leading dir of the refname (e.g.,
+			 * "refs/foo"; if there is a reference with that name,
+			 * it is a conflict, *unless* it is in skip.
+			 */
+			if (skip && string_list_has_string(skip, dirname.buf))
+				continue;
+
+			if (!initial_transaction &&
+			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+					       &type, &ignore_errno)) {
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    dirname.buf, refname);
+				goto cleanup;
+			}
+
+			if (extras && string_list_has_string(extras, dirname.buf)) {
+				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
+					    refname, dirname.buf);
+				goto cleanup;
+			}
+		}
 
 		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
+		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+		 * There is no point in searching for a reference with that
+		 * name, because a refname isn't considered to conflict with
+		 * itself. But we still need to check for references whose
+		 * names are in the "refs/foo/bar/" namespace, because they
+		 * *do* conflict.
 		 */
-		if (skip && string_list_has_string(skip, dirname.buf))
-			continue;
+		strbuf_addstr(&dirname, refname + dirname.len);
+		strbuf_addch(&dirname, '/');
+
+		if (!initial_transaction) {
+			struct ref_iterator *iter;
+			int ok;
+
+			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+						       DO_FOR_EACH_INCLUDE_BROKEN);
+			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+				if (skip &&
+				    string_list_has_string(skip, iter->refname))
+					continue;
+
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    iter->refname, refname);
+				ref_iterator_abort(iter);
+				goto cleanup;
+			}
 
-		if (!initial_transaction &&
-		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
-				       &type, &ignore_errno)) {
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    dirname.buf, refname);
-			goto cleanup;
+			if (ok != ITER_DONE)
+				BUG("error while iterating over references");
 		}
 
-		if (extras && string_list_has_string(extras, dirname.buf)) {
+		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+		if (extra_refname) {
 			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-				    refname, dirname.buf);
+				    refname, extra_refname);
 			goto cleanup;
 		}
 	}
 
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (!initial_transaction) {
-		struct ref_iterator *iter;
-		int ok;
-
-		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-					       DO_FOR_EACH_INCLUDE_BROKEN);
-		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-			if (skip &&
-			    string_list_has_string(skip, iter->refname))
-				continue;
-
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    iter->refname, refname);
-			ref_iterator_abort(iter);
-			goto cleanup;
-		}
-
-		if (ok != ITER_DONE)
-			BUG("error while iterating over references");
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-			    refname, extra_refname);
-	else
-		ret = 0;
+	ret = 0;
 
 cleanup:
 	strbuf_release(&referent);
@@ -2569,6 +2577,23 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
+int refs_verify_refname_available(struct ref_store *refs,
+				  const char *refname,
+				  const struct string_list *extras,
+				  const struct string_list *skip,
+				  unsigned int initial_transaction,
+				  struct strbuf *err)
+{
+	struct string_list_item item = { .string = (char *) refname };
+	struct string_list refnames = {
+		.items = &item,
+		.nr = 1,
+	};
+
+	return refs_verify_refnames_available(refs, &refnames, extras, skip,
+					      initial_transaction, err);
+}
+
 struct do_for_each_reflog_help {
 	each_reflog_fn *fn;
 	void *cb_data;
diff --git a/refs.h b/refs.h
index a0cdd99250e..185aed5a461 100644
--- a/refs.h
+++ b/refs.h
@@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  unsigned int initial_transaction,
 				  struct strbuf *err);
 
+/*
+ * Same as `refs_verify_refname_available()`, but checking for a list of
+ * refnames instead of only a single item. This is more efficient in the case
+ * where one needs to check multiple refnames.
+ */
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err);
+
 int refs_ref_exists(struct ref_store *refs, const char *refname);
 
 int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 05/16] refs/reftable: batch refname availability checks
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-03-06 14:00     ` Karthik Nayak
  2025-02-28  9:26   ` [PATCH v4 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
                     ` (11 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as we
essentially still call `refs_verify_refname_available()` in a loop, but
this will change in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index d39a14c5a46..2a90e7cb391 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
 	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct reftable_transaction_data *tx_data = NULL;
 	struct reftable_backend *be;
 	struct object_id head_oid;
@@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * can output a proper error message instead of failing
 			 * at a later point.
 			 */
-			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL,
-							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
-							    err);
-			if (ret < 0)
-				goto done;
+			string_list_append(&refnames_to_check, u->refname);
 
 			/*
 			 * There is no need to write the reference deletion
@@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	string_list_sort(&refnames_to_check);
+	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
+					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+					     err);
+	if (ret < 0)
+		goto done;
+
 	transaction->backend_data = tx_data;
 	transaction->state = REF_TRANSACTION_PREPARED;
 
@@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 	string_list_clear(&affected_refnames, 0);
 	strbuf_release(&referent);
 	strbuf_release(&head_referent);
+	string_list_clear(&refnames_to_check, 0);
 
 	return ret;
 }

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 06/16] refs/files: batch refname availability checks for normal transactions
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:

  1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
     a new reference, mostly to create a nice, user-readable error
     message. This is nothing we have to care about too much, as we only
     hit this code path at most once when we hit a conflict.

  2. `lock_raw_ref()` when it _could_ create the lockfile to check
     whether it is conflicting with any packed refs. In the general case,
     this code path will be hit once for every (successful) reference
     update.

  3. `lock_ref_oid_basic()`, but it is only executed when copying or
     renaming references or when expiring reflogs. It will thus not be
     called in contexts where we have many references queued up.

  4. `refs_refname_ref_available()`, but again only when copying or
     renaming references. It is thus not interesting due to the same
     reason as the previous case.

  5. `files_transaction_finish_initial()`, which is only executed when
     creating a new repository or migrating references.

So out of these, only (2) and (5) are viable candidates to use the
batched checks.

Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.

The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..6ce79cf0791 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -678,6 +678,7 @@ static void unlock_ref(struct ref_lock *lock)
  */
 static int lock_raw_ref(struct files_ref_store *refs,
 			const char *refname, int mustexist,
+			struct string_list *refnames_to_check,
 			const struct string_list *extras,
 			struct ref_lock **lock_p,
 			struct strbuf *referent,
@@ -855,16 +856,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 		}
 
 		/*
-		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing packed ref that
-		 * conflicts with refname:
+		 * If the ref did not exist and we are creating it, we have to
+		 * make sure there is no existing packed ref that conflicts
+		 * with refname. This check is deferred so that we can batch it.
 		 */
-		if (refs_verify_refname_available(
-				    refs->packed_ref_store, refname,
-				    extras, NULL, 0, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto error_return;
-		}
+		string_list_insert(refnames_to_check, refname);
 	}
 
 	ret = 0;
@@ -2569,6 +2565,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			       struct ref_update *update,
 			       struct ref_transaction *transaction,
 			       const char *head_ref,
+			       struct string_list *refnames_to_check,
 			       struct string_list *affected_refnames,
 			       struct strbuf *err)
 {
@@ -2597,7 +2594,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 		lock->count++;
 	} else {
 		ret = lock_raw_ref(refs, update->refname, mustexist,
-				   affected_refnames,
+				   refnames_to_check, affected_refnames,
 				   &lock, &referent,
 				   &update->type, err);
 		if (ret) {
@@ -2811,6 +2808,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	char *head_ref = NULL;
 	int head_type;
 	struct files_transaction_backend_data *backend_data;
@@ -2898,7 +2896,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		struct ref_update *update = transaction->updates[i];
 
 		ret = lock_ref_for_update(refs, update, transaction,
-					  head_ref, &affected_refnames, err);
+					  head_ref, &refnames_to_check,
+					  &affected_refnames, err);
 		if (ret)
 			goto cleanup;
 
@@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	/*
+	 * Verify that none of the loose reference that we're about to write
+	 * conflict with any existing packed references. Ideally, we'd do this
+	 * check after the packed-refs are locked so that the file cannot
+	 * change underneath our feet. But introducing such a lock now would
+	 * probably do more harm than good as users rely on there not being a
+	 * global lock with the "files" backend.
+	 *
+	 * Another alternative would be to do the check after the (optional)
+	 * lock, but that would extend the time we spend in the globally-locked
+	 * state.
+	 *
+	 * So instead, we accept the race for now.
+	 */
+	if (refs_verify_refnames_available(refs->packed_ref_store, &refnames_to_check,
+					   &affected_refnames, NULL, 0, err)) {
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
 	if (packed_transaction) {
 		if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
@@ -2972,6 +2991,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 cleanup:
 	free(head_ref);
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 
 	if (ret)
 		files_transaction_cleanup(refs, transaction);

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-03-06 14:10     ` Karthik Nayak
  2025-02-28  9:26   ` [PATCH v4 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
                     ` (9 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.

Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.

This doesn't yet have an effect on performance as the underlying
logic simply calls This will improve performance when cloning a repository with
many references or when migrating references from any format to the
"files" format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6ce79cf0791..11a620ea11a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct ref_transaction *packed_transaction = NULL;
 	struct ref_transaction *loose_transaction = NULL;
 
@@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		    !is_null_oid(&update->old_oid))
 			BUG("initial ref transaction with old_sha1 set");
 
-		if (refs_verify_refname_available(&refs->base, update->refname,
-						  &affected_refnames, NULL, 1, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto cleanup;
-		}
+		string_list_append(&refnames_to_check, update->refname);
 
 		/*
 		 * packed-refs don't support symbolic refs, root refs and reflogs,
@@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 	}
 
-	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
-	    ref_transaction_commit(packed_transaction, err)) {
+	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
+					   &affected_refnames, NULL, 1, err)) {
+		packed_refs_unlock(refs->packed_ref_store);
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
+	if (ref_transaction_commit(packed_transaction, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
 		goto cleanup;
 	}
@@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		ref_transaction_free(packed_transaction);
 	transaction->state = REF_TRANSACTION_CLOSED;
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 	return ret;
 }
 

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 08/16] refs: stop re-verifying common prefixes for availability
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.

When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.

Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
      Range (min … max):    60.6 ms …  69.5 ms    38 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
      Range (min … max):    38.1 ms …  47.3 ms    61 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

For the "files" backend we see an improvement, but a much smaller one:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     395.8 ms ±   5.3 ms    [User: 63.6 ms, System: 330.5 ms]
      Range (min … max):   387.0 ms … 404.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     386.0 ms ±   4.0 ms    [User: 51.5 ms, System: 332.8 ms]
      Range (min … max):   380.8 ms … 392.6 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     836.6 ms ±   5.6 ms    [User: 645.2 ms, System: 185.2 ms]
      Range (min … max):   829.6 ms … 845.9 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     759.8 ms ±   5.1 ms    [User: 574.9 ms, System: 178.9 ms]
      Range (min … max):   753.1 ms … 768.8 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = HEAD) ran
        1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

And vice versa:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     870.7 ms ±   5.7 ms    [User: 735.2 ms, System: 127.4 ms]
      Range (min … max):   861.6 ms … 883.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     799.1 ms ±   8.5 ms    [User: 661.1 ms, System: 130.2 ms]
      Range (min … max):   787.5 ms … 812.6 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/refs.c b/refs.c
index 5a9b0f2fa1e..eaf41421f50 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct strset dirnames;
 	int ret = -1;
 
 	/*
@@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 	assert(err);
 
+	strset_init(&dirnames);
+
 	for (size_t i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 		const char *extra_refname;
@@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 			if (skip && string_list_has_string(skip, dirname.buf))
 				continue;
 
+			/*
+			 * If we've already seen the directory we don't need to
+			 * process it again. Skip it to avoid checking checking
+			 * common prefixes like "refs/heads/" repeatedly.
+			 */
+			if (!strset_add(&dirnames, dirname.buf))
+				continue;
+
 			if (!initial_transaction &&
 			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 					       &type, &ignore_errno)) {
@@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 cleanup:
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
+	strset_clear(&dirnames);
 	return ret;
 }
 

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 09/16] refs/iterator: separate lifecycle from iteration
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |  2 +
 dir-iterator.c               | 24 +++++------
 dir-iterator.h               | 11 ++---
 iterator.h                   |  2 +-
 refs.c                       |  7 +++-
 refs/debug.c                 |  9 ++---
 refs/files-backend.c         | 36 +++++------------
 refs/iterator.c              | 95 ++++++++++++++------------------------------
 refs/packed-backend.c        | 27 ++++++-------
 refs/ref-cache.c             |  9 ++---
 refs/refs-internal.h         | 29 +++++---------
 refs/reftable-backend.c      | 34 ++++------------
 t/helper/test-dir-iterator.c |  1 +
 13 files changed, 100 insertions(+), 186 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..ccd6a197343 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -28,7 +28,7 @@
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -39,6 +39,7 @@
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_free(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +108,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/iterator.h b/iterator.h
index 0f6900e43ad..6b77dcc2626 100644
--- a/iterator.h
+++ b/iterator.h
@@ -12,7 +12,7 @@
 #define ITER_OK 0
 
 /*
- * The iterator is exhausted and has been freed.
+ * The iterator is exhausted.
  */
 #define ITER_DONE -1
 
diff --git a/refs.c b/refs.c
index eaf41421f50..8eff60a2186 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 11a620ea11a..859f1c11941 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -915,10 +915,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -931,23 +927,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1378,7 +1368,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1386,6 +1376,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1452,6 +1443,7 @@ static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2299,9 +2291,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2311,23 +2300,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..aaeff270437 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +365,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +377,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +424,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..74e2c03cef1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -293,7 +293,7 @@ enum do_for_each_ref_flags {
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -307,6 +307,7 @@ enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +334,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +435,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +453,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2a90e7cb391..06543f79c64 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2017,17 +2008,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2038,21 +2022,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.

Introduce a new `.seek` function in the ref iterator vtable that allows
callers to seek an iterator multiple times. It is expected to be
functionally the same as calling `refs_ref_iterator_begin()` with a
different (or the same) prefix.

Note that it is not possible to adjust parameters other than the seeked
prefix for now, so exclude patterns, trimmed prefixes and flags will
remain unchanged. We do not have a usecase for changing these parameters
right now, but if we ever find one we can adapt accordingly.

Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/debug.c         | 11 +++++++++++
 refs/iterator.c      | 24 ++++++++++++++++++++++++
 refs/refs-internal.h | 24 ++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/refs/debug.c b/refs/debug.c
index a9786da4ba1..5390fa9c187 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -169,6 +169,16 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return res;
 }
 
+static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->seek(diter->iter, prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	return res;
+}
+
 static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -189,6 +199,7 @@ static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
+	.seek = debug_ref_iterator_seek,
 	.peel = debug_ref_iterator_peel,
 	.release = debug_ref_iterator_release,
 };
diff --git a/refs/iterator.c b/refs/iterator.c
index aaeff270437..757b105261a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,6 +15,12 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix)
+{
+	return ref_iterator->vtable->seek(ref_iterator, prefix);
+}
+
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled)
 {
@@ -50,6 +56,12 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 	return ITER_DONE;
 }
 
+static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				   const char *prefix UNUSED)
+{
+	return 0;
+}
+
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				   struct object_id *peeled UNUSED)
 {
@@ -62,6 +74,7 @@ static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
+	.seek = empty_ref_iterator_seek,
 	.peel = empty_ref_iterator_peel,
 	.release = empty_ref_iterator_release,
 };
@@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				    struct object_id *peeled)
 {
@@ -387,6 +410,7 @@ static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
+	.seek = prefix_ref_iterator_seek,
 	.peel = prefix_ref_iterator_peel,
 	.release = prefix_ref_iterator_release,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 74e2c03cef1..8f18274a165 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -327,6 +327,22 @@ struct ref_iterator {
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
 /*
  * If possible, peel the reference currently being viewed by the
  * iterator. Return 0 on success.
@@ -445,6 +461,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference matching the given prefix. Should
+ * behave the same as if a new iterator was created with the same prefix.
+ */
+typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
+				 const char *prefix);
+
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
  */
@@ -459,6 +482,7 @@ typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
+	ref_iterator_seek_fn *seek;
 	ref_iterator_peel_fn *peel;
 	ref_iterator_release_fn *release;
 };

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 11/16] refs/iterator: implement seeking for merged iterators
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index 757b105261a..63608ef9907 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
 struct merge_ref_iterator {
 	struct ref_iterator base;
 
-	struct ref_iterator *iter0, *iter1;
+	struct ref_iterator *iter0, *iter0_owned;
+	struct ref_iterator *iter1, *iter1_owned;
 
 	ref_iterator_select_fn *select;
 	void *cb_data;
@@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
-			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
-			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -177,7 +176,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
-			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -206,7 +204,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
-				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -226,6 +223,28 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_ERROR;
 }
 
+static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ret;
+
+	iter->current = NULL;
+	iter->iter0 = iter->iter0_owned;
+	iter->iter1 = iter->iter1_owned;
+
+	ret = ref_iterator_seek(iter->iter0, prefix);
+	if (ret < 0)
+		return ret;
+
+	ret = ref_iterator_seek(iter->iter1, prefix);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	ref_iterator_free(iter->iter0);
-	ref_iterator_free(iter->iter1);
+	ref_iterator_free(iter->iter0_owned);
+	ref_iterator_free(iter->iter1_owned);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
+	.seek = merge_ref_iterator_seek,
 	.peel = merge_ref_iterator_peel,
 	.release = merge_ref_iterator_release,
 };
@@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 */
 
 	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
-	iter->iter0 = iter0;
-	iter->iter1 = iter1;
+	iter->iter0 = iter->iter0_owned = iter0;
+	iter->iter1 = iter->iter1_owned = iter1;
 	iter->select = select;
 	iter->cb_data = cb_data;
 	iter->current = NULL;

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-03-06 14:16     ` Karthik Nayak
  2025-02-28  9:26   ` [PATCH v4 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:

  - We do not support seeking on reflog iterators. It is unclear what
    seeking would even look like in this context, as you typically would
    want to seek to a specific entry in the reflog for a specific ref.
    There is not currently a usecase for this, but if there ever is we
    can implement seeking in the future.

  - We start to check whether `reftable_stack_init_ref_iterator()` is
    successful.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 06543f79c64..b0c09f34433 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -547,7 +547,7 @@ struct reftable_ref_iterator {
 	struct reftable_ref_record ref;
 	struct object_id oid;
 
-	const char *prefix;
+	char *prefix;
 	size_t prefix_len;
 	char **exclude_patterns;
 	size_t exclude_patterns_index;
@@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				      const char *prefix)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->prefix_len = prefix ? strlen(prefix) : 0;
+	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+
+	return iter->err;
+}
+
 static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				      struct object_id *peeled)
 {
@@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
+	free(iter->prefix);
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
+	.seek = reftable_ref_iterator_seek,
 	.peel = reftable_ref_iterator_peel,
 	.release = reftable_ref_iterator_release,
 };
@@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
-	iter->prefix = prefix;
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
 	iter->refs = refs;
@@ -821,8 +835,11 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	reftable_stack_init_ref_iterator(stack, &iter->iter);
-	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
+	ret = reftable_stack_init_ref_iterator(stack, &iter->iter);
+	if (ret)
+		goto done;
+
+	ret = reftable_ref_iterator_seek(&iter->base, prefix);
 	if (ret)
 		goto done;
 
@@ -2015,6 +2032,13 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+					 const char *prefix UNUSED)
+{
+	BUG("reftable reflog iterator cannot be seeked");
+	return -1;
+}
+
 static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 					 struct object_id *peeled UNUSED)
 {
@@ -2033,6 +2057,7 @@ static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
+	.seek = reftable_reflog_iterator_seek,
 	.peel = reftable_reflog_iterator_peel,
 	.release = reftable_reflog_iterator_release,
 };

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.

Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance concern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/ref-cache.c | 79 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 6457e02c1ea..c1f1bab1d50 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -362,9 +362,7 @@ struct cache_ref_iterator {
 	struct ref_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack.
 	 */
 	size_t levels_nr;
 
@@ -376,7 +374,7 @@ struct cache_ref_iterator {
 	 * The prefix is matched textually, without regard for path
 	 * component boundaries.
 	 */
-	const char *prefix;
+	char *prefix;
 
 	/*
 	 * A stack of levels. levels[0] is the uppermost level that is
@@ -389,6 +387,9 @@ struct cache_ref_iterator {
 	struct cache_ref_iterator_level *levels;
 
 	struct repository *repo;
+	struct ref_cache *cache;
+
+	int prime_dir;
 };
 
 static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
@@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
 
+	if (!iter->levels_nr)
+		return ITER_DONE;
+
 	while (1) {
 		struct cache_ref_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -444,6 +448,41 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct cache_ref_iterator_level *level;
+	struct ref_dir *dir;
+
+	dir = get_ref_dir(iter->cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix);
+	if (!dir) {
+		iter->levels_nr = 0;
+		return 0;
+	}
+
+	if (iter->prime_dir)
+		prime_ref_dir(dir, prefix);
+	iter->levels_nr = 1;
+	level = &iter->levels[0];
+	level->index = -1;
+	level->dir = dir;
+
+	if (prefix && *prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup(prefix);
+		level->prefix_state = PREFIX_WITHIN_DIR;
+	} else {
+		FREE_AND_NULL(iter->prefix);
+		level->prefix_state = PREFIX_CONTAINS_DIR;
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -456,12 +495,13 @@ static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-	free((char *)iter->prefix);
+	free(iter->prefix);
 	free(iter->levels);
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
+	.seek = cache_ref_iterator_seek,
 	.peel = cache_ref_iterator_peel,
 	.release = cache_ref_iterator_release,
 };
@@ -471,39 +511,22 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      struct repository *repo,
 					      int prime_dir)
 {
-	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	dir = get_ref_dir(cache->root);
-	if (prefix && *prefix)
-		dir = find_containing_dir(dir, prefix);
-	if (!dir)
-		/* There's nothing to iterate over. */
-		return empty_ref_iterator_begin();
-
-	if (prime_dir)
-		prime_ref_dir(dir, prefix);
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
+	iter->repo = repo;
+	iter->cache = cache;
+	iter->prime_dir = prime_dir;
 
-	if (prefix && *prefix) {
-		iter->prefix = xstrdup(prefix);
-		level->prefix_state = PREFIX_WITHIN_DIR;
-	} else {
-		level->prefix_state = PREFIX_CONTAINS_DIR;
+	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
 	}
 
-	iter->repo = repo;
-
 	return ref_iterator;
 }

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 14/16] refs/iterator: implement seeking for packed-ref iterators
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (12 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 65 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 38a1956d1a8..f4c82ba2c7d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -819,6 +819,8 @@ struct packed_ref_iterator {
 
 	struct snapshot *snapshot;
 
+	char *prefix;
+
 	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
@@ -841,11 +843,9 @@ struct packed_ref_iterator {
 };
 
 /*
- * Move the iterator to the next record in the snapshot, without
- * respect for whether the record is actually required by the current
- * iteration. Adjust the fields in `iter` and return `ITER_OK` or
- * `ITER_DONE`. This function does not free the iterator in the case
- * of `ITER_DONE`.
+ * Move the iterator to the next record in the snapshot. Adjust the fields in
+ * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
+ * iterator in the case of `ITER_DONE`.
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
@@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = next_record(iter)) == ITER_OK) {
+		const char *refname = iter->base.refname;
+		const char *prefix = iter->prefix;
+
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
 		    !is_per_worktree_ref(iter->base.refname))
 			continue;
@@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 					    &iter->oid, iter->flags))
 			continue;
 
+		while (prefix && *prefix) {
+			if (*refname < *prefix)
+				BUG("packed-refs backend yielded reference preceding its prefix");
+			else if (*refname > *prefix)
+				return ITER_DONE;
+			prefix++;
+			refname++;
+		}
+
 		return ITER_OK;
 	}
 
 	return ok;
 }
 
+static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct packed_ref_iterator *iter =
+		(struct packed_ref_iterator *)ref_iterator;
+	const char *start;
+
+	if (prefix && *prefix)
+		start = find_reference_location(iter->snapshot, prefix, 0);
+	else
+		start = iter->snapshot->start;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->pos = start;
+	iter->eof = iter->snapshot->eof;
+
+	return 0;
+}
+
 static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
+	free(iter->prefix);
 	release_snapshot(iter->snapshot);
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
+	.seek = packed_ref_iterator_seek,
 	.peel = packed_ref_iterator_peel,
 	.release = packed_ref_iterator_release,
 };
@@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
-	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
-	if (start == snapshot->eof)
-		return empty_ref_iterator_begin();
-
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
@@ -1130,19 +1155,15 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
-
-	iter->pos = start;
-	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-
 	iter->base.oid = &iter->oid;
-
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (prefix && *prefix)
-		/* Stop iteration after we've gone *past* prefix: */
-		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
+	}
 
 	return ref_iterator;
 }

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 15/16] refs/iterator: implement seeking for files iterators
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (13 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-02-28  9:26   ` [PATCH v4 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
  2025-03-06 14:20   ` [PATCH v4 00/16] refs: batch refname availability checks Karthik Nayak
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 859f1c11941..4e1c50fead3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -918,6 +918,14 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -936,6 +944,7 @@ static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
+	.seek = files_ref_iterator_seek,
 	.peel = files_ref_iterator_peel,
 	.release = files_ref_iterator_release,
 };
@@ -2294,6 +2303,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				      const char *prefix UNUSED)
+{
+	BUG("ref_iterator_seek() called for reflog_iterator");
+}
+
 static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				      struct object_id *peeled UNUSED)
 {
@@ -2309,6 +2324,7 @@ static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
+	.seek = files_reflog_iterator_seek,
 	.peel = files_reflog_iterator_peel,
 	.release = files_reflog_iterator_release,
 };

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v4 16/16] refs: reuse iterators when determining refname availability
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (14 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
@ 2025-02-28  9:26   ` Patrick Steinhardt
  2025-03-06 14:20   ` [PATCH v4 00/16] refs: batch refname availability checks Karthik Nayak
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-02-28  9:26 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

The "files" backend doesn't really show a huge impact:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
      Range (min … max):   384.6 ms … 404.5 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
      Range (min … max):   377.0 ms … 397.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 8eff60a2186..6cbb9decdb0 100644
--- a/refs.c
+++ b/refs.c
@@ -2555,8 +2555,13 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		if (!initial_transaction) {
 			int ok;
 
-			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-						       DO_FOR_EACH_INCLUDE_BROKEN);
+			if (!iter) {
+				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+							       DO_FOR_EACH_INCLUDE_BROKEN);
+			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+				goto cleanup;
+			}
+
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 				    string_list_has_string(skip, iter->refname))
@@ -2569,9 +2574,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
-
-			ref_iterator_free(iter);
-			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);

-- 
2.49.0.rc0.375.gae4b89d849.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-02-28  9:26   ` [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-03-06 13:21     ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 13:21 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2031 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> When reading an object ID via `get_oid_basic()` or any of its related
> functions we perform a check whether the object ID is ambiguous, which
> can be the case when a reference with the same name exists. While the
> check is generally helpful, there are cases where it only adds to the
> runtime overhead without providing much of a benefit.
>
> Add a new flag that allows us to disable the check. The flag will be
> used in a subsequent commit.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  hash.h        | 1 +
>  object-name.c | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hash.h b/hash.h
> index 4367acfec50..79419016513 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -204,6 +204,7 @@ struct object_id {
>  #define GET_OID_ONLY_TO_DIE    04000
>  #define GET_OID_REQUIRE_PATH  010000
>  #define GET_OID_HASH_ANY      020000
> +#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
>

Nit: not worth re-rolling for, but the other macros are aligned. Our
styling guide is to align them and this is only set in our
'clang-format' [1].

[1] : https://clang.llvm.org/docs/ClangFormatStyleOptions.html#alignconsecutiveassignments

>  #define GET_OID_DISAMBIGUATORS \
>  	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
> diff --git a/object-name.c b/object-name.c
> index 233f3f861e3..85444dbb15b 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
>  	int fatal = !(flags & GET_OID_QUIETLY);
>
>  	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
> -		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
> +		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
> +		    repo_settings_get_warn_ambiguous_refs(r) &&
> +		    warn_on_object_refname_ambiguity) {
>  			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
>  			if (refs_found > 0) {
>  				warning(warn_msg, len, str);
>
> --
> 2.49.0.rc0.375.gae4b89d849.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 04/16] refs: introduce function to batch refname availability checks
  2025-02-28  9:26   ` [PATCH v4 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-03-06 13:47     ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 13:47 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

> diff --git a/refs.h b/refs.h
> index a0cdd99250e..185aed5a461 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
>  				  unsigned int initial_transaction,
>  				  struct strbuf *err);
>
> +/*
> + * Same as `refs_verify_refname_available()`, but checking for a list of
> + * refnames instead of only a single item. This is more efficient in the case
> + * where one needs to check multiple refnames.
> + */
> +int refs_verify_refnames_available(struct ref_store *refs,
> +				   const struct string_list *refnames,
> +				   const struct string_list *extras,
> +				   const struct string_list *skip,
> +				   unsigned int initial_transaction,
> +				   struct strbuf *err);
> +
>  int refs_ref_exists(struct ref_store *refs, const char *refname);
>
>  int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

FYI: In my patch-series to add partial transaction support (based on top
of this series), I move this function to 'refs-internal.h', because I
also pass in the transaction to it.

The patch looks good!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 05/16] refs/reftable: batch refname availability checks
  2025-02-28  9:26   ` [PATCH v4 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-03-06 14:00     ` Karthik Nayak
  2025-03-06 14:12       ` Karthik Nayak
  0 siblings, 1 reply; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 14:00 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2555 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Refactor the "reftable" backend to batch the availability check for
> refnames. This does not yet have an effect on performance as we
> essentially still call `refs_verify_refname_available()` in a loop, but
> this will change in subsequent commits.
>

I thought this patch removes it from the loop. Which loop are you
talking about?

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/reftable-backend.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index d39a14c5a46..2a90e7cb391 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
>  	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
>  	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
>  	struct reftable_transaction_data *tx_data = NULL;
>  	struct reftable_backend *be;
>  	struct object_id head_oid;
> @@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  			 * can output a proper error message instead of failing
>  			 * at a later point.
>  			 */
> -			ret = refs_verify_refname_available(ref_store, u->refname,
> -							    &affected_refnames, NULL,
> -							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
> -							    err);
> -			if (ret < 0)
> -				goto done;
> +			string_list_append(&refnames_to_check, u->refname);
>
>  			/*
>  			 * There is no need to write the reference deletion
> @@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  		}
>  	}
>
> +	string_list_sort(&refnames_to_check);
> +	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
> +					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
> +					     err);
> +	if (ret < 0)
> +		goto done;
> +
>  	transaction->backend_data = tx_data;
>  	transaction->state = REF_TRANSACTION_PREPARED;
>
> @@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  	string_list_clear(&affected_refnames, 0);
>  	strbuf_release(&referent);
>  	strbuf_release(&head_referent);
> +	string_list_clear(&refnames_to_check, 0);
>
>  	return ret;
>  }
>
> --
> 2.49.0.rc0.375.gae4b89d849.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions
  2025-02-28  9:26   ` [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-03-06 14:10     ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 14:10 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3371 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> The "files" backend explicitly carves out special logic for its initial
> transaction so that it can avoid writing out every single reference as
> a loose reference. While the assumption is that there shouldn't be any
> preexisting references, we still have to verify that none of the newly
> written references will conflict with any other new reference in the
> same transaction.
>
> Refactor the initial transaction to use batched refname availability
> checks. This does not yet have an effect on performance as we still call
> `refs_verify_refname_available()` in a loop. But this will change in
> subsequent commits and then impact performance when cloning a repository
> with many references or when migrating references to the "files" format.
>
> This doesn't yet have an effect on performance as the underlying
> logic simply calls This will improve performance when cloning a repository with

Seems like this sentence needs to be re-written.

> many references or when migrating references from any format to the
> "files" format.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/files-backend.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 6ce79cf0791..11a620ea11a 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  	size_t i;
>  	int ret = 0;
>  	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
>  	struct ref_transaction *packed_transaction = NULL;
>  	struct ref_transaction *loose_transaction = NULL;
>
> @@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		    !is_null_oid(&update->old_oid))
>  			BUG("initial ref transaction with old_sha1 set");
>
> -		if (refs_verify_refname_available(&refs->base, update->refname,
> -						  &affected_refnames, NULL, 1, err)) {
> -			ret = TRANSACTION_NAME_CONFLICT;
> -			goto cleanup;
> -		}
> +		string_list_append(&refnames_to_check, update->refname);
>
>  		/*
>  		 * packed-refs don't support symbolic refs, root refs and reflogs,
> @@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		}
>  	}
>
> -	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
> -	    ref_transaction_commit(packed_transaction, err)) {
> +	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
> +		ret = TRANSACTION_GENERIC_ERROR;
> +		goto cleanup;
> +	}
> +
> +	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
> +					   &affected_refnames, NULL, 1, err)) {
> +		packed_refs_unlock(refs->packed_ref_store);
> +		ret = TRANSACTION_NAME_CONFLICT;
> +		goto cleanup;
> +	}
> +
> +	if (ref_transaction_commit(packed_transaction, err)) {
>  		ret = TRANSACTION_GENERIC_ERROR;
>  		goto cleanup;
>  	}
> @@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		ref_transaction_free(packed_transaction);
>  	transaction->state = REF_TRANSACTION_CLOSED;
>  	string_list_clear(&affected_refnames, 0);
> +	string_list_clear(&refnames_to_check, 0);
>  	return ret;
>  }
>
>
> --
> 2.49.0.rc0.375.gae4b89d849.dirty

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 05/16] refs/reftable: batch refname availability checks
  2025-03-06 14:00     ` Karthik Nayak
@ 2025-03-06 14:12       ` Karthik Nayak
  2025-03-06 15:13         ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 14:12 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

Karthik Nayak <karthik.188@gmail.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> Refactor the "reftable" backend to batch the availability check for
>> refnames. This does not yet have an effect on performance as we
>> essentially still call `refs_verify_refname_available()` in a loop, but
>> this will change in subsequent commits.
>>
>
> I thought this patch removes it from the loop. Which loop are you
> talking about?
>

Looking at future patches, maybe this 'loop' is a reference to how
'refs_verify_refnames_available()' still loops over all references,
which we start optimizing in patch 08 and onward?

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators
  2025-02-28  9:26   ` [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-03-06 14:16     ` Karthik Nayak
  0 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 14:16 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Implement seeking of reftable iterators. As the low-level reftable
> iterators already support seeking this change is straight-forward. Two
> notes though:
>
>   - We do not support seeking on reflog iterators. It is unclear what
>     seeking would even look like in this context, as you typically would
>     want to seek to a specific entry in the reflog for a specific ref.
>     There is not currently a usecase for this, but if there ever is we
>     can implement seeking in the future.
>

Nit: This last sentence reads a little weird, perhaps:

  There is currently no use case for this, but if one arises in the
  future, we can implement seeking.

>   - We start to check whether `reftable_stack_init_ref_iterator()` is
>     successful.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
>

The patch looks good.

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 00/16] refs: batch refname availability checks
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
                     ` (15 preceding siblings ...)
  2025-02-28  9:26   ` [PATCH v4 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
@ 2025-03-06 14:20   ` Karthik Nayak
  16 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 14:20 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 7935 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this patch series has been inspired by brian's report that the reftable
> backend is significantly slower when writing many references compared to
> the files backend. As explained in that thread, the underlying issue is
> the design of tombstone references: when we first delete all references
> in a repository and then recreate them, we still have all the tombstones
> and thus we need to churn through all of them to figure out that they
> have been deleted in the first place. The files backend does not have
> this issue.
>
> I consider the benchmark itself to be kind of broken, as it stems from
> us deleting all refs and then recreating them. And if you pack refs in
> between then the "reftable" backend outperforms the "files" backend.
>
> But there are a couple of opportunities here anyway. While we cannot
> make the underlying issue of tombstones being less efficient go away,
> this has prompted me to have a deeper look at where we spend all the
> time. There are three ideas in this series:
>
>   - git-update-ref(1) performs ambiguity checks for any full-size object
>     ID, which triggers a lot of reads. This is somewhat pointless though
>     given that the manpage explicitly points out that the command is
>     about object IDs, even though it does know to parse refs. But being
>     part of plumbing, emitting the warning here does not make a ton of
>     sense, and favoring object IDs over references in these cases is the
>     obvious thing to do anyway.
>
>   - For each ref "refs/heads/bar", we need to verify that neither
>     "refs/heads" nor "refs" exists. This was repeated for every refname,
>     but because most refnames use common prefixes this made us re-check
>     a lot of prefixes. This is addressed by using a `strset` of already
>     checked prefixes.
>
>   - For each ref "refs/heads/bar", we need to verify that no ref
>     "refs/heads/bar/*" exists. We always created a new ref iterator for
>     this check, which requires us to discard all internal state and then
>     recreate it. The reftable library has already been refactored though
>     to have reseekable iterators, so we backfill this functionality to
>     all the other iterators and then reuse the iterator.
>
> With the (somewhat broken) benchmark we see a small speedup with the
> "files" backend:
>
>     Benchmark 1: update-ref (refformat = files, revision = master)
>       Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
>       Range (min … max):   232.2 ms … 236.9 ms    10 runs
>
>     Benchmark 2: update-ref (refformat = files, revision = HEAD)
>       Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
>       Range (min … max):   181.1 ms … 187.0 ms    10 runs
>
>     Summary
>       update-ref (refformat = files, revision = HEAD) ran
>         1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)
>
> And a huge speedup with the "reftable" backend:
>
>     Benchmark 1: update-ref (refformat = reftable, revision = master)
>       Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
>       Range (min … max):   16.785 s … 16.982 s    10 runs
>
>     Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
>       Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
>       Range (min … max):    2.215 s …  2.244 s    10 runs
>
>     Summary
>       update-ref (refformat = reftable, revision = HEAD) ran
>         7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)
>
> We're still not up to speed with the "files" backend, but considerably
> better. Given that this is an extreme edge case and not reflective of
> the general case I'm okay with this result for now.
>
> But more importantly, this refactoring also has a positive effect when
> updating references in a repository with preexisting refs, which I
> consider to be the more realistic scenario. The following benchmark
> creates 10k refs with 100k preexisting refs.
>
> With the "files" backend we see a modest improvement:
>
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
>       Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
>       Range (min … max):   465.4 ms … 496.6 ms    10 runs
>
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
>       Range (min … max):   376.5 ms … 403.1 ms    10 runs
>
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
>
> But with the "reftable" backend we see an almost 5x improvement, where
> it's now ~15x faster than the "files" backend:
>
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
>       Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
>       Range (min … max):   150.5 ms … 158.4 ms    18 runs
>
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
>       Range (min … max):    29.8 ms …  38.6 ms    71 runs
>
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
>
> The series is structured as follows:
>
>   - Patches 1 to 4 implement the logic to skip ambiguity checks in
>     git-update-ref(1).
>
>   - Patch 5 to 8 introduce batched checks.
>
>   - Patch 9 deduplicates the ref prefix checks.
>
>   - Patch 10 to 16 implement the infrastructure to reseek iterators.
>
>   - Patch 17 starts to reuse iterators for nested ref checks.
>
> Changes in v2:
>   - Point out why we also have to touch up the `dir_iterator`.
>   - Fix up the comment explaining `ITER_DONE`.
>   - Fix up comments that show usage patterns of the ref and dir iterator
>     interfaces.
>   - Start batching availability checks in the "files" backend, as well.
>   - Improve the commit message that drops the ambiguity check so that we
>     also point to 25fba78d36b (cat-file: disable object/refname
>     ambiguity check for batch mode, 2013-07-12).
>   - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im
>
> Changes in v3:
>   - Fix one case where we didn't skip ambiguity checks in
>     git-update-ref(1).
>   - Document better that only the prefix can change on reseeking
>     iterators. Other internal state will remain the same.
>   - Fix a memory leak in the ref-cache iterator.
>   - Don't ignore errors returned by `packed_ref_iterator_seek()`.
>   - Link to v2: https://lore.kernel.org/r/20250219-pks-update-ref-optimization-v2-0-e696e7220b22@pks.im
>
> Changes in v4:
>   - A couple of clarifications in the commit message that disabled
>     ambiguity warnings.
>   - Link to v3: https://lore.kernel.org/r/20250225-pks-update-ref-optimization-v3-0-77c3687cda75@pks.im
>
> Thanks!
>

I did a review of the 4th version and it generally looks great. I've
noted some nits, but I don't see anything that warrants a re-roll.
Thanks

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v5 00/16] refs: batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (17 preceding siblings ...)
  2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
@ 2025-03-06 15:08 ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
                     ` (17 more replies)
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
  19 siblings, 18 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Hi,

this patch series has been inspired by brian's report that the reftable
backend is significantly slower when writing many references compared to
the files backend. As explained in that thread, the underlying issue is
the design of tombstone references: when we first delete all references
in a repository and then recreate them, we still have all the tombstones
and thus we need to churn through all of them to figure out that they
have been deleted in the first place. The files backend does not have
this issue.

I consider the benchmark itself to be kind of broken, as it stems from
us deleting all refs and then recreating them. And if you pack refs in
between then the "reftable" backend outperforms the "files" backend.

But there are a couple of opportunities here anyway. While we cannot
make the underlying issue of tombstones being less efficient go away,
this has prompted me to have a deeper look at where we spend all the
time. There are three ideas in this series:

  - git-update-ref(1) performs ambiguity checks for any full-size object
    ID, which triggers a lot of reads. This is somewhat pointless though
    given that the manpage explicitly points out that the command is
    about object IDs, even though it does know to parse refs. But being
    part of plumbing, emitting the warning here does not make a ton of
    sense, and favoring object IDs over references in these cases is the
    obvious thing to do anyway.

  - For each ref "refs/heads/bar", we need to verify that neither
    "refs/heads" nor "refs" exists. This was repeated for every refname,
    but because most refnames use common prefixes this made us re-check
    a lot of prefixes. This is addressed by using a `strset` of already
    checked prefixes.

  - For each ref "refs/heads/bar", we need to verify that no ref
    "refs/heads/bar/*" exists. We always created a new ref iterator for
    this check, which requires us to discard all internal state and then
    recreate it. The reftable library has already been refactored though
    to have reseekable iterators, so we backfill this functionality to
    all the other iterators and then reuse the iterator.

With the (somewhat broken) benchmark we see a small speedup with the
"files" backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
      Range (min … max):   232.2 ms … 236.9 ms    10 runs

    Benchmark 2: update-ref (refformat = files, revision = HEAD)
      Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
      Range (min … max):   181.1 ms … 187.0 ms    10 runs

    Summary
      update-ref (refformat = files, revision = HEAD) ran
        1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)

And a huge speedup with the "reftable" backend:

    Benchmark 1: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
      Range (min … max):   16.785 s … 16.982 s    10 runs

    Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
      Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
      Range (min … max):    2.215 s …  2.244 s    10 runs

    Summary
      update-ref (refformat = reftable, revision = HEAD) ran
        7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)

We're still not up to speed with the "files" backend, but considerably
better. Given that this is an extreme edge case and not reflective of
the general case I'm okay with this result for now.

But more importantly, this refactoring also has a positive effect when
updating references in a repository with preexisting refs, which I
consider to be the more realistic scenario. The following benchmark
creates 10k refs with 100k preexisting refs.

With the "files" backend we see a modest improvement:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
      Range (min … max):   465.4 ms … 496.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
      Range (min … max):   376.5 ms … 403.1 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)

But with the "reftable" backend we see an almost 5x improvement, where
it's now ~15x faster than the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
      Range (min … max):   150.5 ms … 158.4 ms    18 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
      Range (min … max):    29.8 ms …  38.6 ms    71 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

The series is structured as follows:

  - Patches 1 to 4 implement the logic to skip ambiguity checks in
    git-update-ref(1).

  - Patch 5 to 8 introduce batched checks.

  - Patch 9 deduplicates the ref prefix checks.

  - Patch 10 to 16 implement the infrastructure to reseek iterators.

  - Patch 17 starts to reuse iterators for nested ref checks.

Changes in v2:
  - Point out why we also have to touch up the `dir_iterator`.
  - Fix up the comment explaining `ITER_DONE`.
  - Fix up comments that show usage patterns of the ref and dir iterator
    interfaces.
  - Start batching availability checks in the "files" backend, as well.
  - Improve the commit message that drops the ambiguity check so that we
    also point to 25fba78d36b (cat-file: disable object/refname
    ambiguity check for batch mode, 2013-07-12).
  - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im

Changes in v3:
  - Fix one case where we didn't skip ambiguity checks in
    git-update-ref(1).
  - Document better that only the prefix can change on reseeking
    iterators. Other internal state will remain the same.
  - Fix a memory leak in the ref-cache iterator.
  - Don't ignore errors returned by `packed_ref_iterator_seek()`.
  - Link to v2: https://lore.kernel.org/r/20250219-pks-update-ref-optimization-v2-0-e696e7220b22@pks.im

Changes in v4:
  - A couple of clarifications in the commit message that disabled
    ambiguity warnings.
  - Link to v3: https://lore.kernel.org/r/20250225-pks-update-ref-optimization-v3-0-77c3687cda75@pks.im

Changes in v5:
  - Improve a couple of commit messages.
  - Align `GET_OID_*` flag values.
  - Link to v4: https://lore.kernel.org/r/20250228-pks-update-ref-optimization-v4-0-6425c04268b5@pks.im

Thanks!

Patrick

[1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

---
Patrick Steinhardt (16):
      object-name: introduce `repo_get_oid_with_flags()`
      object-name: allow skipping ambiguity checks in `get_oid()` family
      builtin/update-ref: skip ambiguity checks when parsing object IDs
      refs: introduce function to batch refname availability checks
      refs/reftable: batch refname availability checks
      refs/files: batch refname availability checks for normal transactions
      refs/files: batch refname availability checks for initial transactions
      refs: stop re-verifying common prefixes for availability
      refs/iterator: separate lifecycle from iteration
      refs/iterator: provide infrastructure to re-seek iterators
      refs/iterator: implement seeking for merged iterators
      refs/iterator: implement seeking for reftable iterators
      refs/iterator: implement seeking for ref-cache iterators
      refs/iterator: implement seeking for packed-ref iterators
      refs/iterator: implement seeking for files iterators
      refs: reuse iterators when determining refname availability

 builtin/clone.c              |   2 +
 builtin/update-ref.c         |  15 ++--
 dir-iterator.c               |  24 +++---
 dir-iterator.h               |  11 +--
 hash.h                       |  23 +++---
 iterator.h                   |   2 +-
 object-name.c                |  18 +++--
 object-name.h                |   6 ++
 refs.c                       | 186 ++++++++++++++++++++++++++-----------------
 refs.h                       |  12 +++
 refs/debug.c                 |  20 +++--
 refs/files-backend.c         | 117 +++++++++++++++++----------
 refs/iterator.c              | 145 +++++++++++++++++----------------
 refs/packed-backend.c        |  92 ++++++++++++---------
 refs/ref-cache.c             |  88 ++++++++++++--------
 refs/refs-internal.h         |  53 +++++++-----
 refs/reftable-backend.c      |  85 +++++++++++---------
 t/helper/test-dir-iterator.c |   1 +
 18 files changed, 539 insertions(+), 361 deletions(-)

Range-diff versus v4:

 1:  0d104cd63d5 =  1:  0fa8574b155 object-name: introduce `repo_get_oid_with_flags()`
 2:  3fc310597db !  2:  9bc87107e05 object-name: allow skipping ambiguity checks in `get_oid()` family
    @@ Commit message
     
      ## hash.h ##
     @@ hash.h: struct object_id {
    - #define GET_OID_ONLY_TO_DIE    04000
    - #define GET_OID_REQUIRE_PATH  010000
    - #define GET_OID_HASH_ANY      020000
    + 	int algo;	/* XXX requires 4-byte alignment */
    + };
    + 
    +-#define GET_OID_QUIETLY           01
    +-#define GET_OID_COMMIT            02
    +-#define GET_OID_COMMITTISH        04
    +-#define GET_OID_TREE             010
    +-#define GET_OID_TREEISH          020
    +-#define GET_OID_BLOB             040
    +-#define GET_OID_FOLLOW_SYMLINKS 0100
    +-#define GET_OID_RECORD_PATH     0200
    +-#define GET_OID_ONLY_TO_DIE    04000
    +-#define GET_OID_REQUIRE_PATH  010000
    +-#define GET_OID_HASH_ANY      020000
    ++#define GET_OID_QUIETLY                  01
    ++#define GET_OID_COMMIT                   02
    ++#define GET_OID_COMMITTISH               04
    ++#define GET_OID_TREE                    010
    ++#define GET_OID_TREEISH                 020
    ++#define GET_OID_BLOB                    040
    ++#define GET_OID_FOLLOW_SYMLINKS        0100
    ++#define GET_OID_RECORD_PATH            0200
    ++#define GET_OID_ONLY_TO_DIE           04000
    ++#define GET_OID_REQUIRE_PATH         010000
    ++#define GET_OID_HASH_ANY             020000
     +#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
      
      #define GET_OID_DISAMBIGUATORS \
 3:  f702ed7ece6 =  3:  345af4b7012 builtin/update-ref: skip ambiguity checks when parsing object IDs
 4:  eead30e64d7 =  4:  f1422bc1ef6 refs: introduce function to batch refname availability checks
 5:  1bd5eb82167 !  5:  1aa1b95b5d8 refs/reftable: batch refname availability checks
    @@ Commit message
         refs/reftable: batch refname availability checks
     
         Refactor the "reftable" backend to batch the availability check for
    -    refnames. This does not yet have an effect on performance as we
    -    essentially still call `refs_verify_refname_available()` in a loop, but
    -    this will change in subsequent commits.
    +    refnames. This does not yet have an effect on performance as
    +    `refs_verify_refnames_available()` effectively still performs the
    +    availability check for each refname individually. But this will be
    +    optimized in subsequent commits, where we learn to optimize some parts
    +    of the logic when checking multiple refnames for availability.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 6:  02d02c9c219 =  6:  2cf0c6a3bbe refs/files: batch refname availability checks for normal transactions
 7:  a3c4a4f751d !  7:  84d7830c5e5 refs/files: batch refname availability checks for initial transactions
    @@ Commit message
         subsequent commits and then impact performance when cloning a repository
         with many references or when migrating references to the "files" format.
     
    -    This doesn't yet have an effect on performance as the underlying
    -    logic simply calls This will improve performance when cloning a repository with
    -    many references or when migrating references from any format to the
    -    "files" format.
    +    This will improve performance when cloning a repository with many
    +    references or when migrating references from any format to the "files"
    +    format once the availability checks have learned to optimize checks for
    +    many references in a subsequent commit.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 8:  595d98a6cd3 =  8:  a0d99e8a3a0 refs: stop re-verifying common prefixes for availability
 9:  7dd99377266 =  9:  b73d1ffa761 refs/iterator: separate lifecycle from iteration
10:  8935bed6fcb = 10:  a1de5c7819c refs/iterator: provide infrastructure to re-seek iterators
11:  c9bfdde3c89 = 11:  90698a9237b refs/iterator: implement seeking for merged iterators
12:  25fe519bff6 ! 12:  420d6e08fa9 refs/iterator: implement seeking for reftable iterators
    @@ Commit message
           - We do not support seeking on reflog iterators. It is unclear what
             seeking would even look like in this context, as you typically would
             want to seek to a specific entry in the reflog for a specific ref.
    -        There is not currently a usecase for this, but if there ever is we
    -        can implement seeking in the future.
    +        There is currently no use case for this, but if one arises in the
    +        future, we can still implement seeking at that later point.
     
           - We start to check whether `reftable_stack_init_ref_iterator()` is
             successful.
13:  8e8d5b4b91e = 13:  0f16f1bc5f6 refs/iterator: implement seeking for ref-cache iterators
14:  1670074af7c = 14:  01c6a5bf45d refs/iterator: implement seeking for packed-ref iterators
15:  537ff4ae54c = 15:  4179771a9cb refs/iterator: implement seeking for files iterators
16:  93084536803 = 16:  e51b3985882 refs: reuse iterators when determining refname availability

---
base-commit: e2067b49ecaef9b7f51a17ce251f9207f72ef52d
change-id: 20250217-pks-update-ref-optimization-15c795e66e2b



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v5 01/16] object-name: introduce `repo_get_oid_with_flags()`
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.

This function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 14 ++++++++------
 object-name.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 945d5bdef25..233f3f861e3 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
 	strbuf_release(&ctx->symlink_path);
 }
 
-/*
- * This is like "get_oid_basic()", except it allows "object ID expressions",
- * notably "xyz^" for "parent of xyz"
- */
-int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+int repo_get_oid_with_flags(struct repository *r, const char *name,
+			    struct object_id *oid, unsigned flags)
 {
 	struct object_context unused;
-	int ret = get_oid_with_context(r, name, 0, oid, &unused);
+	int ret = get_oid_with_context(r, name, flags, oid, &unused);
 	object_context_release(&unused);
 	return ret;
 }
 
+int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+{
+	return repo_get_oid_with_flags(r, name, oid, 0);
+}
+
 /*
  * This returns a non-zero value if the string (built using printf
  * format and the given arguments) is not a valid object.
diff --git a/object-name.h b/object-name.h
index 8dba4a47a47..cda4934cd5f 100644
--- a/object-name.h
+++ b/object-name.h
@@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
 void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 			      int abbrev_len);
 
+/*
+ * This is like "get_oid_basic()", except it allows "object ID expressions",
+ * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
+ */
+int repo_get_oid_with_flags(struct repository *r, const char *str,
+			    struct object_id *oid, unsigned flags);
 int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
 __attribute__((format (printf, 2, 3)))
 int get_oidf(struct object_id *oid, const char *fmt, ...);

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 12:12     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
                     ` (15 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.

Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 hash.h        | 23 ++++++++++++-----------
 object-name.c |  4 +++-
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/hash.h b/hash.h
index 4367acfec50..5e3c462dc5e 100644
--- a/hash.h
+++ b/hash.h
@@ -193,17 +193,18 @@ struct object_id {
 	int algo;	/* XXX requires 4-byte alignment */
 };
 
-#define GET_OID_QUIETLY           01
-#define GET_OID_COMMIT            02
-#define GET_OID_COMMITTISH        04
-#define GET_OID_TREE             010
-#define GET_OID_TREEISH          020
-#define GET_OID_BLOB             040
-#define GET_OID_FOLLOW_SYMLINKS 0100
-#define GET_OID_RECORD_PATH     0200
-#define GET_OID_ONLY_TO_DIE    04000
-#define GET_OID_REQUIRE_PATH  010000
-#define GET_OID_HASH_ANY      020000
+#define GET_OID_QUIETLY                  01
+#define GET_OID_COMMIT                   02
+#define GET_OID_COMMITTISH               04
+#define GET_OID_TREE                    010
+#define GET_OID_TREEISH                 020
+#define GET_OID_BLOB                    040
+#define GET_OID_FOLLOW_SYMLINKS        0100
+#define GET_OID_RECORD_PATH            0200
+#define GET_OID_ONLY_TO_DIE           04000
+#define GET_OID_REQUIRE_PATH         010000
+#define GET_OID_HASH_ANY             020000
+#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 233f3f861e3..85444dbb15b 100644
--- a/object-name.c
+++ b/object-name.c
@@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	int fatal = !(flags & GET_OID_QUIETLY);
 
 	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
-		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
+		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
+		    repo_settings_get_warn_ambiguous_refs(r) &&
+		    warn_on_object_refname_ambiguity) {
 			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 			if (refs_found > 0) {
 				warning(warn_msg, len, str);

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.

One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:

  - The manpage is explicitly structured around object IDs. So if we see
    a fully blown object ID, the intent should be quite clear in
    general.

  - The command is part of our plumbing layer and not a tool that users
    would generally use in interactive workflows. As such, the warning
    will likely not be visible to anybody in the first place.

  - Users can and should use the fully-qualified refname in case there
    is any potential for ambiguity. And given that this command is part
    of our plumbing layer, one should always try to be as defensive as
    possible and use fully-qualified refnames.

Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.

The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).

Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.

The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
      Range (min … max):   461.9 ms … 479.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
      Range (min … max):   384.9 ms … 405.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

And with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
      Range (min … max):   142.7 ms … 150.8 ms    19 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
      Range (min … max):    61.1 ms …  66.6 ms    41 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/update-ref.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4d35bdc4b4b..1d541e13ade 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
 		(*next)++;
 		*next = parse_arg(*next, &arg);
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else {
 			/* Without -z, an empty value means all zeros: */
@@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
 		*next += arg.len;
 
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
 			/* With -z, treat an empty value as all zeros: */
@@ -299,7 +301,8 @@ static void parse_cmd_symref_update(struct ref_transaction *transaction,
 			die("symref-update %s: expected old value", refname);
 
 		if (!strcmp(old_arg, "oid")) {
-			if (repo_get_oid(the_repository, old_target, &old_oid))
+			if (repo_get_oid_with_flags(the_repository, old_target, &old_oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				die("symref-update %s: invalid oid: %s", refname, old_target);
 
 			have_old_oid = 1;
@@ -772,7 +775,8 @@ int cmd_update_ref(int argc,
 		refname = argv[0];
 		value = argv[1];
 		oldval = argv[2];
-		if (repo_get_oid(the_repository, value, &oid))
+		if (repo_get_oid_with_flags(the_repository, value, &oid,
+					    GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid SHA1", value);
 	}
 
@@ -783,7 +787,8 @@ int cmd_update_ref(int argc,
 			 * must not already exist:
 			 */
 			oidclr(&oldoid, the_repository->hash_algo);
-		else if (repo_get_oid(the_repository, oldval, &oldoid))
+		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
+						 GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid old SHA1", oldval);
 	}
 

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 04/16] refs: introduce function to batch refname availability checks
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 12:36     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 05/16] refs/reftable: " Patrick Steinhardt
                     ` (13 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.

Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.

The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 169 +++++++++++++++++++++++++++++++++++++----------------------------
 refs.h |  12 +++++
 2 files changed, 109 insertions(+), 72 deletions(-)

diff --git a/refs.c b/refs.c
index f4094a326a9..5a9b0f2fa1e 100644
--- a/refs.c
+++ b/refs.c
@@ -2467,19 +2467,15 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 	return ret;
 }
 
-int refs_verify_refname_available(struct ref_store *refs,
-				  const char *refname,
-				  const struct string_list *extras,
-				  const struct string_list *skip,
-				  unsigned int initial_transaction,
-				  struct strbuf *err)
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err)
 {
-	const char *slash;
-	const char *extra_refname;
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct object_id oid;
-	unsigned int type;
 	int ret = -1;
 
 	/*
@@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
 
 	assert(err);
 
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/*
-		 * Just saying "Is a directory" when we e.g. can't
-		 * lock some multi-level ref isn't very informative,
-		 * the user won't be told *what* is a directory, so
-		 * let's not use strerror() below.
-		 */
-		int ignore_errno;
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+	for (size_t i = 0; i < refnames->nr; i++) {
+		const char *refname = refnames->items[i].string;
+		const char *extra_refname;
+		struct object_id oid;
+		unsigned int type;
+		const char *slash;
+
+		strbuf_reset(&dirname);
+
+		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+			/*
+			 * Just saying "Is a directory" when we e.g. can't
+			 * lock some multi-level ref isn't very informative,
+			 * the user won't be told *what* is a directory, so
+			 * let's not use strerror() below.
+			 */
+			int ignore_errno;
+
+			/* Expand dirname to the new prefix, not including the trailing slash: */
+			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+			/*
+			 * We are still at a leading dir of the refname (e.g.,
+			 * "refs/foo"; if there is a reference with that name,
+			 * it is a conflict, *unless* it is in skip.
+			 */
+			if (skip && string_list_has_string(skip, dirname.buf))
+				continue;
+
+			if (!initial_transaction &&
+			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+					       &type, &ignore_errno)) {
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    dirname.buf, refname);
+				goto cleanup;
+			}
+
+			if (extras && string_list_has_string(extras, dirname.buf)) {
+				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
+					    refname, dirname.buf);
+				goto cleanup;
+			}
+		}
 
 		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
+		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+		 * There is no point in searching for a reference with that
+		 * name, because a refname isn't considered to conflict with
+		 * itself. But we still need to check for references whose
+		 * names are in the "refs/foo/bar/" namespace, because they
+		 * *do* conflict.
 		 */
-		if (skip && string_list_has_string(skip, dirname.buf))
-			continue;
+		strbuf_addstr(&dirname, refname + dirname.len);
+		strbuf_addch(&dirname, '/');
+
+		if (!initial_transaction) {
+			struct ref_iterator *iter;
+			int ok;
+
+			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+						       DO_FOR_EACH_INCLUDE_BROKEN);
+			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+				if (skip &&
+				    string_list_has_string(skip, iter->refname))
+					continue;
+
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    iter->refname, refname);
+				ref_iterator_abort(iter);
+				goto cleanup;
+			}
 
-		if (!initial_transaction &&
-		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
-				       &type, &ignore_errno)) {
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    dirname.buf, refname);
-			goto cleanup;
+			if (ok != ITER_DONE)
+				BUG("error while iterating over references");
 		}
 
-		if (extras && string_list_has_string(extras, dirname.buf)) {
+		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+		if (extra_refname) {
 			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-				    refname, dirname.buf);
+				    refname, extra_refname);
 			goto cleanup;
 		}
 	}
 
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (!initial_transaction) {
-		struct ref_iterator *iter;
-		int ok;
-
-		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-					       DO_FOR_EACH_INCLUDE_BROKEN);
-		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-			if (skip &&
-			    string_list_has_string(skip, iter->refname))
-				continue;
-
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    iter->refname, refname);
-			ref_iterator_abort(iter);
-			goto cleanup;
-		}
-
-		if (ok != ITER_DONE)
-			BUG("error while iterating over references");
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-			    refname, extra_refname);
-	else
-		ret = 0;
+	ret = 0;
 
 cleanup:
 	strbuf_release(&referent);
@@ -2569,6 +2577,23 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
+int refs_verify_refname_available(struct ref_store *refs,
+				  const char *refname,
+				  const struct string_list *extras,
+				  const struct string_list *skip,
+				  unsigned int initial_transaction,
+				  struct strbuf *err)
+{
+	struct string_list_item item = { .string = (char *) refname };
+	struct string_list refnames = {
+		.items = &item,
+		.nr = 1,
+	};
+
+	return refs_verify_refnames_available(refs, &refnames, extras, skip,
+					      initial_transaction, err);
+}
+
 struct do_for_each_reflog_help {
 	each_reflog_fn *fn;
 	void *cb_data;
diff --git a/refs.h b/refs.h
index a0cdd99250e..185aed5a461 100644
--- a/refs.h
+++ b/refs.h
@@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  unsigned int initial_transaction,
 				  struct strbuf *err);
 
+/*
+ * Same as `refs_verify_refname_available()`, but checking for a list of
+ * refnames instead of only a single item. This is more efficient in the case
+ * where one needs to check multiple refnames.
+ */
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err);
+
 int refs_ref_exists(struct ref_store *refs, const char *refname);
 
 int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 05/16] refs/reftable: batch refname availability checks
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 12:54     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
                     ` (12 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as
`refs_verify_refnames_available()` effectively still performs the
availability check for each refname individually. But this will be
optimized in subsequent commits, where we learn to optimize some parts
of the logic when checking multiple refnames for availability.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index d39a14c5a46..2a90e7cb391 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
 	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct reftable_transaction_data *tx_data = NULL;
 	struct reftable_backend *be;
 	struct object_id head_oid;
@@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * can output a proper error message instead of failing
 			 * at a later point.
 			 */
-			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL,
-							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
-							    err);
-			if (ret < 0)
-				goto done;
+			string_list_append(&refnames_to_check, u->refname);
 
 			/*
 			 * There is no need to write the reference deletion
@@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	string_list_sort(&refnames_to_check);
+	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
+					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+					     err);
+	if (ret < 0)
+		goto done;
+
 	transaction->backend_data = tx_data;
 	transaction->state = REF_TRANSACTION_PREPARED;
 
@@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 	string_list_clear(&affected_refnames, 0);
 	strbuf_release(&referent);
 	strbuf_release(&head_referent);
+	string_list_clear(&refnames_to_check, 0);
 
 	return ret;
 }

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 12:58     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
                     ` (11 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:

  1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
     a new reference, mostly to create a nice, user-readable error
     message. This is nothing we have to care about too much, as we only
     hit this code path at most once when we hit a conflict.

  2. `lock_raw_ref()` when it _could_ create the lockfile to check
     whether it is conflicting with any packed refs. In the general case,
     this code path will be hit once for every (successful) reference
     update.

  3. `lock_ref_oid_basic()`, but it is only executed when copying or
     renaming references or when expiring reflogs. It will thus not be
     called in contexts where we have many references queued up.

  4. `refs_refname_ref_available()`, but again only when copying or
     renaming references. It is thus not interesting due to the same
     reason as the previous case.

  5. `files_transaction_finish_initial()`, which is only executed when
     creating a new repository or migrating references.

So out of these, only (2) and (5) are viable candidates to use the
batched checks.

Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.

The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..6ce79cf0791 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -678,6 +678,7 @@ static void unlock_ref(struct ref_lock *lock)
  */
 static int lock_raw_ref(struct files_ref_store *refs,
 			const char *refname, int mustexist,
+			struct string_list *refnames_to_check,
 			const struct string_list *extras,
 			struct ref_lock **lock_p,
 			struct strbuf *referent,
@@ -855,16 +856,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 		}
 
 		/*
-		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing packed ref that
-		 * conflicts with refname:
+		 * If the ref did not exist and we are creating it, we have to
+		 * make sure there is no existing packed ref that conflicts
+		 * with refname. This check is deferred so that we can batch it.
 		 */
-		if (refs_verify_refname_available(
-				    refs->packed_ref_store, refname,
-				    extras, NULL, 0, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto error_return;
-		}
+		string_list_insert(refnames_to_check, refname);
 	}
 
 	ret = 0;
@@ -2569,6 +2565,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			       struct ref_update *update,
 			       struct ref_transaction *transaction,
 			       const char *head_ref,
+			       struct string_list *refnames_to_check,
 			       struct string_list *affected_refnames,
 			       struct strbuf *err)
 {
@@ -2597,7 +2594,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 		lock->count++;
 	} else {
 		ret = lock_raw_ref(refs, update->refname, mustexist,
-				   affected_refnames,
+				   refnames_to_check, affected_refnames,
 				   &lock, &referent,
 				   &update->type, err);
 		if (ret) {
@@ -2811,6 +2808,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	char *head_ref = NULL;
 	int head_type;
 	struct files_transaction_backend_data *backend_data;
@@ -2898,7 +2896,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		struct ref_update *update = transaction->updates[i];
 
 		ret = lock_ref_for_update(refs, update, transaction,
-					  head_ref, &affected_refnames, err);
+					  head_ref, &refnames_to_check,
+					  &affected_refnames, err);
 		if (ret)
 			goto cleanup;
 
@@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	/*
+	 * Verify that none of the loose reference that we're about to write
+	 * conflict with any existing packed references. Ideally, we'd do this
+	 * check after the packed-refs are locked so that the file cannot
+	 * change underneath our feet. But introducing such a lock now would
+	 * probably do more harm than good as users rely on there not being a
+	 * global lock with the "files" backend.
+	 *
+	 * Another alternative would be to do the check after the (optional)
+	 * lock, but that would extend the time we spend in the globally-locked
+	 * state.
+	 *
+	 * So instead, we accept the race for now.
+	 */
+	if (refs_verify_refnames_available(refs->packed_ref_store, &refnames_to_check,
+					   &affected_refnames, NULL, 0, err)) {
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
 	if (packed_transaction) {
 		if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
@@ -2972,6 +2991,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 cleanup:
 	free(head_ref);
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 
 	if (ret)
 		files_transaction_cleanup(refs, transaction);

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 13:06     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
                     ` (10 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.

Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.

This will improve performance when cloning a repository with many
references or when migrating references from any format to the "files"
format once the availability checks have learned to optimize checks for
many references in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6ce79cf0791..11a620ea11a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct ref_transaction *packed_transaction = NULL;
 	struct ref_transaction *loose_transaction = NULL;
 
@@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		    !is_null_oid(&update->old_oid))
 			BUG("initial ref transaction with old_sha1 set");
 
-		if (refs_verify_refname_available(&refs->base, update->refname,
-						  &affected_refnames, NULL, 1, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto cleanup;
-		}
+		string_list_append(&refnames_to_check, update->refname);
 
 		/*
 		 * packed-refs don't support symbolic refs, root refs and reflogs,
@@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 	}
 
-	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
-	    ref_transaction_commit(packed_transaction, err)) {
+	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
+					   &affected_refnames, NULL, 1, err)) {
+		packed_refs_unlock(refs->packed_ref_store);
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
+	if (ref_transaction_commit(packed_transaction, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
 		goto cleanup;
 	}
@@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		ref_transaction_free(packed_transaction);
 	transaction->state = REF_TRANSACTION_CLOSED;
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 	return ret;
 }
 

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 13:22     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
                     ` (9 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.

When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.

Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
      Range (min … max):    60.6 ms …  69.5 ms    38 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
      Range (min … max):    38.1 ms …  47.3 ms    61 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

For the "files" backend we see an improvement, but a much smaller one:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     395.8 ms ±   5.3 ms    [User: 63.6 ms, System: 330.5 ms]
      Range (min … max):   387.0 ms … 404.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     386.0 ms ±   4.0 ms    [User: 51.5 ms, System: 332.8 ms]
      Range (min … max):   380.8 ms … 392.6 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     836.6 ms ±   5.6 ms    [User: 645.2 ms, System: 185.2 ms]
      Range (min … max):   829.6 ms … 845.9 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     759.8 ms ±   5.1 ms    [User: 574.9 ms, System: 178.9 ms]
      Range (min … max):   753.1 ms … 768.8 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = HEAD) ran
        1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

And vice versa:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     870.7 ms ±   5.7 ms    [User: 735.2 ms, System: 127.4 ms]
      Range (min … max):   861.6 ms … 883.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     799.1 ms ±   8.5 ms    [User: 661.1 ms, System: 130.2 ms]
      Range (min … max):   787.5 ms … 812.6 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/refs.c b/refs.c
index 5a9b0f2fa1e..eaf41421f50 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct strset dirnames;
 	int ret = -1;
 
 	/*
@@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 	assert(err);
 
+	strset_init(&dirnames);
+
 	for (size_t i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 		const char *extra_refname;
@@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 			if (skip && string_list_has_string(skip, dirname.buf))
 				continue;
 
+			/*
+			 * If we've already seen the directory we don't need to
+			 * process it again. Skip it to avoid checking checking
+			 * common prefixes like "refs/heads/" repeatedly.
+			 */
+			if (!strset_add(&dirnames, dirname.buf))
+				continue;
+
 			if (!initial_transaction &&
 			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 					       &type, &ignore_errno)) {
@@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 cleanup:
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
+	strset_clear(&dirnames);
 	return ret;
 }
 

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-12 13:45     ` shejialuo
  2025-03-06 15:08   ` [PATCH v5 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
                     ` (8 subsequent siblings)
  17 siblings, 1 reply; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |  2 +
 dir-iterator.c               | 24 +++++------
 dir-iterator.h               | 11 ++---
 iterator.h                   |  2 +-
 refs.c                       |  7 +++-
 refs/debug.c                 |  9 ++---
 refs/files-backend.c         | 36 +++++------------
 refs/iterator.c              | 95 ++++++++++++++------------------------------
 refs/packed-backend.c        | 27 ++++++-------
 refs/ref-cache.c             |  9 ++---
 refs/refs-internal.h         | 29 +++++---------
 refs/reftable-backend.c      | 34 ++++------------
 t/helper/test-dir-iterator.c |  1 +
 13 files changed, 100 insertions(+), 186 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..ccd6a197343 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -28,7 +28,7 @@
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -39,6 +39,7 @@
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_free(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +108,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/iterator.h b/iterator.h
index 0f6900e43ad..6b77dcc2626 100644
--- a/iterator.h
+++ b/iterator.h
@@ -12,7 +12,7 @@
 #define ITER_OK 0
 
 /*
- * The iterator is exhausted and has been freed.
+ * The iterator is exhausted.
  */
 #define ITER_DONE -1
 
diff --git a/refs.c b/refs.c
index eaf41421f50..8eff60a2186 100644
--- a/refs.c
+++ b/refs.c
@@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 {
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2552,7 +2553,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2564,12 +2564,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2586,6 +2588,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 11a620ea11a..859f1c11941 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -915,10 +915,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -931,23 +927,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1378,7 +1368,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1386,6 +1376,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1452,6 +1443,7 @@ static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2299,9 +2291,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2311,23 +2300,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..aaeff270437 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +365,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +377,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +424,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..74e2c03cef1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -293,7 +293,7 @@ enum do_for_each_ref_flags {
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -307,6 +307,7 @@ enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +334,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +435,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +453,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2a90e7cb391..06543f79c64 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2017,17 +2008,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2038,21 +2022,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.

Introduce a new `.seek` function in the ref iterator vtable that allows
callers to seek an iterator multiple times. It is expected to be
functionally the same as calling `refs_ref_iterator_begin()` with a
different (or the same) prefix.

Note that it is not possible to adjust parameters other than the seeked
prefix for now, so exclude patterns, trimmed prefixes and flags will
remain unchanged. We do not have a usecase for changing these parameters
right now, but if we ever find one we can adapt accordingly.

Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/debug.c         | 11 +++++++++++
 refs/iterator.c      | 24 ++++++++++++++++++++++++
 refs/refs-internal.h | 24 ++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/refs/debug.c b/refs/debug.c
index a9786da4ba1..5390fa9c187 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -169,6 +169,16 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return res;
 }
 
+static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->seek(diter->iter, prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	return res;
+}
+
 static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -189,6 +199,7 @@ static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
+	.seek = debug_ref_iterator_seek,
 	.peel = debug_ref_iterator_peel,
 	.release = debug_ref_iterator_release,
 };
diff --git a/refs/iterator.c b/refs/iterator.c
index aaeff270437..757b105261a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,6 +15,12 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix)
+{
+	return ref_iterator->vtable->seek(ref_iterator, prefix);
+}
+
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled)
 {
@@ -50,6 +56,12 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 	return ITER_DONE;
 }
 
+static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				   const char *prefix UNUSED)
+{
+	return 0;
+}
+
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				   struct object_id *peeled UNUSED)
 {
@@ -62,6 +74,7 @@ static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
+	.seek = empty_ref_iterator_seek,
 	.peel = empty_ref_iterator_peel,
 	.release = empty_ref_iterator_release,
 };
@@ -368,6 +381,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				    struct object_id *peeled)
 {
@@ -387,6 +410,7 @@ static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
+	.seek = prefix_ref_iterator_seek,
 	.peel = prefix_ref_iterator_peel,
 	.release = prefix_ref_iterator_release,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 74e2c03cef1..8f18274a165 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -327,6 +327,22 @@ struct ref_iterator {
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
 /*
  * If possible, peel the reference currently being viewed by the
  * iterator. Return 0 on success.
@@ -445,6 +461,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference matching the given prefix. Should
+ * behave the same as if a new iterator was created with the same prefix.
+ */
+typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
+				 const char *prefix);
+
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
  */
@@ -459,6 +482,7 @@ typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
+	ref_iterator_seek_fn *seek;
 	ref_iterator_peel_fn *peel;
 	ref_iterator_release_fn *release;
 };

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 11/16] refs/iterator: implement seeking for merged iterators
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index 757b105261a..63608ef9907 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
 struct merge_ref_iterator {
 	struct ref_iterator base;
 
-	struct ref_iterator *iter0, *iter1;
+	struct ref_iterator *iter0, *iter0_owned;
+	struct ref_iterator *iter1, *iter1_owned;
 
 	ref_iterator_select_fn *select;
 	void *cb_data;
@@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
-			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
-			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -177,7 +176,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
-			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -206,7 +204,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
-				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -226,6 +223,28 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_ERROR;
 }
 
+static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ret;
+
+	iter->current = NULL;
+	iter->iter0 = iter->iter0_owned;
+	iter->iter1 = iter->iter1_owned;
+
+	ret = ref_iterator_seek(iter->iter0, prefix);
+	if (ret < 0)
+		return ret;
+
+	ret = ref_iterator_seek(iter->iter1, prefix);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	ref_iterator_free(iter->iter0);
-	ref_iterator_free(iter->iter1);
+	ref_iterator_free(iter->iter0_owned);
+	ref_iterator_free(iter->iter1_owned);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
+	.seek = merge_ref_iterator_seek,
 	.peel = merge_ref_iterator_peel,
 	.release = merge_ref_iterator_release,
 };
@@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 */
 
 	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
-	iter->iter0 = iter0;
-	iter->iter1 = iter1;
+	iter->iter0 = iter->iter0_owned = iter0;
+	iter->iter1 = iter->iter1_owned = iter1;
 	iter->select = select;
 	iter->cb_data = cb_data;
 	iter->current = NULL;

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 12/16] refs/iterator: implement seeking for reftable iterators
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:

  - We do not support seeking on reflog iterators. It is unclear what
    seeking would even look like in this context, as you typically would
    want to seek to a specific entry in the reflog for a specific ref.
    There is currently no use case for this, but if one arises in the
    future, we can still implement seeking at that later point.

  - We start to check whether `reftable_stack_init_ref_iterator()` is
    successful.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 06543f79c64..b0c09f34433 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -547,7 +547,7 @@ struct reftable_ref_iterator {
 	struct reftable_ref_record ref;
 	struct object_id oid;
 
-	const char *prefix;
+	char *prefix;
 	size_t prefix_len;
 	char **exclude_patterns;
 	size_t exclude_patterns_index;
@@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				      const char *prefix)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->prefix_len = prefix ? strlen(prefix) : 0;
+	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+
+	return iter->err;
+}
+
 static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				      struct object_id *peeled)
 {
@@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
+	free(iter->prefix);
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
+	.seek = reftable_ref_iterator_seek,
 	.peel = reftable_ref_iterator_peel,
 	.release = reftable_ref_iterator_release,
 };
@@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
-	iter->prefix = prefix;
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
 	iter->refs = refs;
@@ -821,8 +835,11 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	reftable_stack_init_ref_iterator(stack, &iter->iter);
-	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
+	ret = reftable_stack_init_ref_iterator(stack, &iter->iter);
+	if (ret)
+		goto done;
+
+	ret = reftable_ref_iterator_seek(&iter->base, prefix);
 	if (ret)
 		goto done;
 
@@ -2015,6 +2032,13 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+					 const char *prefix UNUSED)
+{
+	BUG("reftable reflog iterator cannot be seeked");
+	return -1;
+}
+
 static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 					 struct object_id *peeled UNUSED)
 {
@@ -2033,6 +2057,7 @@ static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
+	.seek = reftable_reflog_iterator_seek,
 	.peel = reftable_reflog_iterator_peel,
 	.release = reftable_reflog_iterator_release,
 };

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.

Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance concern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/ref-cache.c | 79 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 6457e02c1ea..c1f1bab1d50 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -362,9 +362,7 @@ struct cache_ref_iterator {
 	struct ref_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack.
 	 */
 	size_t levels_nr;
 
@@ -376,7 +374,7 @@ struct cache_ref_iterator {
 	 * The prefix is matched textually, without regard for path
 	 * component boundaries.
 	 */
-	const char *prefix;
+	char *prefix;
 
 	/*
 	 * A stack of levels. levels[0] is the uppermost level that is
@@ -389,6 +387,9 @@ struct cache_ref_iterator {
 	struct cache_ref_iterator_level *levels;
 
 	struct repository *repo;
+	struct ref_cache *cache;
+
+	int prime_dir;
 };
 
 static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
@@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
 
+	if (!iter->levels_nr)
+		return ITER_DONE;
+
 	while (1) {
 		struct cache_ref_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -444,6 +448,41 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct cache_ref_iterator_level *level;
+	struct ref_dir *dir;
+
+	dir = get_ref_dir(iter->cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix);
+	if (!dir) {
+		iter->levels_nr = 0;
+		return 0;
+	}
+
+	if (iter->prime_dir)
+		prime_ref_dir(dir, prefix);
+	iter->levels_nr = 1;
+	level = &iter->levels[0];
+	level->index = -1;
+	level->dir = dir;
+
+	if (prefix && *prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup(prefix);
+		level->prefix_state = PREFIX_WITHIN_DIR;
+	} else {
+		FREE_AND_NULL(iter->prefix);
+		level->prefix_state = PREFIX_CONTAINS_DIR;
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -456,12 +495,13 @@ static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-	free((char *)iter->prefix);
+	free(iter->prefix);
 	free(iter->levels);
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
+	.seek = cache_ref_iterator_seek,
 	.peel = cache_ref_iterator_peel,
 	.release = cache_ref_iterator_release,
 };
@@ -471,39 +511,22 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      struct repository *repo,
 					      int prime_dir)
 {
-	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	dir = get_ref_dir(cache->root);
-	if (prefix && *prefix)
-		dir = find_containing_dir(dir, prefix);
-	if (!dir)
-		/* There's nothing to iterate over. */
-		return empty_ref_iterator_begin();
-
-	if (prime_dir)
-		prime_ref_dir(dir, prefix);
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
+	iter->repo = repo;
+	iter->cache = cache;
+	iter->prime_dir = prime_dir;
 
-	if (prefix && *prefix) {
-		iter->prefix = xstrdup(prefix);
-		level->prefix_state = PREFIX_WITHIN_DIR;
-	} else {
-		level->prefix_state = PREFIX_CONTAINS_DIR;
+	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
 	}
 
-	iter->repo = repo;
-
 	return ref_iterator;
 }

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 14/16] refs/iterator: implement seeking for packed-ref iterators
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (12 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 65 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 38a1956d1a8..f4c82ba2c7d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -819,6 +819,8 @@ struct packed_ref_iterator {
 
 	struct snapshot *snapshot;
 
+	char *prefix;
+
 	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
@@ -841,11 +843,9 @@ struct packed_ref_iterator {
 };
 
 /*
- * Move the iterator to the next record in the snapshot, without
- * respect for whether the record is actually required by the current
- * iteration. Adjust the fields in `iter` and return `ITER_OK` or
- * `ITER_DONE`. This function does not free the iterator in the case
- * of `ITER_DONE`.
+ * Move the iterator to the next record in the snapshot. Adjust the fields in
+ * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
+ * iterator in the case of `ITER_DONE`.
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
@@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = next_record(iter)) == ITER_OK) {
+		const char *refname = iter->base.refname;
+		const char *prefix = iter->prefix;
+
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
 		    !is_per_worktree_ref(iter->base.refname))
 			continue;
@@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 					    &iter->oid, iter->flags))
 			continue;
 
+		while (prefix && *prefix) {
+			if (*refname < *prefix)
+				BUG("packed-refs backend yielded reference preceding its prefix");
+			else if (*refname > *prefix)
+				return ITER_DONE;
+			prefix++;
+			refname++;
+		}
+
 		return ITER_OK;
 	}
 
 	return ok;
 }
 
+static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct packed_ref_iterator *iter =
+		(struct packed_ref_iterator *)ref_iterator;
+	const char *start;
+
+	if (prefix && *prefix)
+		start = find_reference_location(iter->snapshot, prefix, 0);
+	else
+		start = iter->snapshot->start;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->pos = start;
+	iter->eof = iter->snapshot->eof;
+
+	return 0;
+}
+
 static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
+	free(iter->prefix);
 	release_snapshot(iter->snapshot);
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
+	.seek = packed_ref_iterator_seek,
 	.peel = packed_ref_iterator_peel,
 	.release = packed_ref_iterator_release,
 };
@@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
-	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
-	if (start == snapshot->eof)
-		return empty_ref_iterator_begin();
-
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
@@ -1130,19 +1155,15 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
-
-	iter->pos = start;
-	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-
 	iter->base.oid = &iter->oid;
-
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (prefix && *prefix)
-		/* Stop iteration after we've gone *past* prefix: */
-		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
+	}
 
 	return ref_iterator;
 }

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 15/16] refs/iterator: implement seeking for files iterators
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (13 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:08   ` [PATCH v5 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 859f1c11941..4e1c50fead3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -918,6 +918,14 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -936,6 +944,7 @@ static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
+	.seek = files_ref_iterator_seek,
 	.peel = files_ref_iterator_peel,
 	.release = files_ref_iterator_release,
 };
@@ -2294,6 +2303,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				      const char *prefix UNUSED)
+{
+	BUG("ref_iterator_seek() called for reflog_iterator");
+}
+
 static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				      struct object_id *peeled UNUSED)
 {
@@ -2309,6 +2324,7 @@ static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
+	.seek = files_reflog_iterator_seek,
 	.peel = files_reflog_iterator_peel,
 	.release = files_reflog_iterator_release,
 };

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v5 16/16] refs: reuse iterators when determining refname availability
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (14 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
@ 2025-03-06 15:08   ` Patrick Steinhardt
  2025-03-06 15:32   ` [PATCH v5 00/16] refs: batch refname availability checks Karthik Nayak
  2025-03-12 14:03   ` shejialuo
  17 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:08 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

The "files" backend doesn't really show a huge impact:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
      Range (min … max):   384.6 ms … 404.5 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
      Range (min … max):   377.0 ms … 397.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 8eff60a2186..6cbb9decdb0 100644
--- a/refs.c
+++ b/refs.c
@@ -2555,8 +2555,13 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		if (!initial_transaction) {
 			int ok;
 
-			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-						       DO_FOR_EACH_INCLUDE_BROKEN);
+			if (!iter) {
+				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+							       DO_FOR_EACH_INCLUDE_BROKEN);
+			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+				goto cleanup;
+			}
+
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 				    string_list_has_string(skip, iter->refname))
@@ -2569,9 +2574,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
-
-			ref_iterator_free(iter);
-			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);

-- 
2.49.0.rc0.416.g627208d89d.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v4 05/16] refs/reftable: batch refname availability checks
  2025-03-06 14:12       ` Karthik Nayak
@ 2025-03-06 15:13         ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-06 15:13 UTC (permalink / raw)
  To: Karthik Nayak
  Cc: git, brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

On Thu, Mar 06, 2025 at 09:12:41AM -0500, Karthik Nayak wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >> Refactor the "reftable" backend to batch the availability check for
> >> refnames. This does not yet have an effect on performance as we
> >> essentially still call `refs_verify_refname_available()` in a loop, but
> >> this will change in subsequent commits.
> >>
> >
> > I thought this patch removes it from the loop. Which loop are you
> > talking about?
> >
> 
> Looking at future patches, maybe this 'loop' is a reference to how
> 'refs_verify_refnames_available()' still loops over all references,
> which we start optimizing in patch 08 and onward?

Yes, exactly. I'll clarify.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 00/16] refs: batch refname availability checks
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (15 preceding siblings ...)
  2025-03-06 15:08   ` [PATCH v5 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
@ 2025-03-06 15:32   ` Karthik Nayak
  2025-03-12 14:03   ` shejialuo
  17 siblings, 0 replies; 163+ messages in thread
From: Karthik Nayak @ 2025-03-06 15:32 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: brian m. carlson, Jeff King, Junio C Hamano, shejialuo,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

> Changes in v5:
>   - Improve a couple of commit messages.
>   - Align `GET_OID_*` flag values.
>   - Link to v4: https://lore.kernel.org/r/20250228-pks-update-ref-optimization-v4-0-6425c04268b5@pks.im
>

The range-diff looks good (snipped to keep my email small and direct).
Thanks for addressing the comments!

Karthik

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-03-06 15:08   ` [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-03-12 12:12     ` shejialuo
  0 siblings, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-03-12 12:12 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:33PM +0100, Patrick Steinhardt wrote:
> When reading an object ID via `get_oid_basic()` or any of its related
> functions we perform a check whether the object ID is ambiguous, which
> can be the case when a reference with the same name exists. While the
> check is generally helpful, there are cases where it only adds to the
> runtime overhead without providing much of a benefit.
> 

When reading, I am wondering which cases. I somehow think that we may
combine this patch with the next patch. For this standalone commit, the
reader cannot easily get the motivation until reading the next patch.

However, I also understand why you use this standalone commit. So, both
ways are OK from my side.

> Add a new flag that allows us to disable the check. The flag will be
> used in a subsequent commit.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 04/16] refs: introduce function to batch refname availability checks
  2025-03-06 15:08   ` [PATCH v5 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-03-12 12:36     ` shejialuo
  2025-03-12 12:44       ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 2 replies; 163+ messages in thread
From: shejialuo @ 2025-03-12 12:36 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:35PM +0100, Patrick Steinhardt wrote:
> The `refs_verify_refname_available()` functions checks whether a
> reference update can be committed or whether it would conflict with
> either a prefix or suffix thereof. This function needs to be called once
> per reference that one wants to check, which requires us to redo a
> couple of checks every time the function is called.
> 
> Introduce a new function `refs_verify_refnames_available()` that does
> the same, but for a list of references. For now, the new function uses
> the exact same implementation, except that we loop through all refnames
> provided by the caller. This will be tuned in subsequent commits.
> 

After reading this patch, I think we may could add more motivation here.
What's the advantage of checking batch refname? From my understanding,
we want to check the group of refnames and if we find one which is not
good we will just return and we don't need to check further more thus
avoiding unnecessary checks which improves speed.

When I read the commit message, I wonder if we will loop through all
refnames, we still need to handle the same number of refnames. So how do
we avoid this?

I think we should make this clear.

> The existing `refs_verify_refname_available()` function is reimplemented
> on top of the new function. As such, the diff is best viewed with the
> `--ignore-space-change option`.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs.c | 169 +++++++++++++++++++++++++++++++++++++----------------------------
>  refs.h |  12 +++++
>  2 files changed, 109 insertions(+), 72 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index f4094a326a9..5a9b0f2fa1e 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2467,19 +2467,15 @@ int ref_transaction_commit(struct ref_transaction *transaction,
>  	return ret;
>  }
>  
> -int refs_verify_refname_available(struct ref_store *refs,
> -				  const char *refname,
> -				  const struct string_list *extras,
> -				  const struct string_list *skip,
> -				  unsigned int initial_transaction,
> -				  struct strbuf *err)
> +int refs_verify_refnames_available(struct ref_store *refs,
> +				   const struct string_list *refnames,
> +				   const struct string_list *extras,
> +				   const struct string_list *skip,
> +				   unsigned int initial_transaction,
> +				   struct strbuf *err)
>  {
> -	const char *slash;
> -	const char *extra_refname;
>  	struct strbuf dirname = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
> -	struct object_id oid;
> -	unsigned int type;
>  	int ret = -1;
>  
>  	/*
> @@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
>  
>  	assert(err);
>  
> -	strbuf_grow(&dirname, strlen(refname) + 1);
> -	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
> -		/*
> -		 * Just saying "Is a directory" when we e.g. can't
> -		 * lock some multi-level ref isn't very informative,
> -		 * the user won't be told *what* is a directory, so
> -		 * let's not use strerror() below.
> -		 */
> -		int ignore_errno;
> -		/* Expand dirname to the new prefix, not including the trailing slash: */
> -		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
> +	for (size_t i = 0; i < refnames->nr; i++) {

Nit: we may just use `for_each_string_list_item` instead of use the raw
"for" loop.

> +		const char *refname = refnames->items[i].string;
> +		const char *extra_refname;
> +		struct object_id oid;
> +		unsigned int type;
> +		const char *slash;
> +
> +		strbuf_reset(&dirname);
> +
> +		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
> +			/*
> +			 * Just saying "Is a directory" when we e.g. can't
> +			 * lock some multi-level ref isn't very informative,
> +			 * the user won't be told *what* is a directory, so
> +			 * let's not use strerror() below.
> +			 */
> +			int ignore_errno;
> +
> +			/* Expand dirname to the new prefix, not including the trailing slash: */
> +			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
> +
> +			/*
> +			 * We are still at a leading dir of the refname (e.g.,
> +			 * "refs/foo"; if there is a reference with that name,
> +			 * it is a conflict, *unless* it is in skip.
> +			 */
> +			if (skip && string_list_has_string(skip, dirname.buf))
> +				continue;
> +
> +			if (!initial_transaction &&
> +			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
> +					       &type, &ignore_errno)) {
> +				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
> +					    dirname.buf, refname);
> +				goto cleanup;
> +			}
> +
> +			if (extras && string_list_has_string(extras, dirname.buf)) {
> +				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
> +					    refname, dirname.buf);
> +				goto cleanup;
> +			}
> +		}
>  
>  		/*
> -		 * We are still at a leading dir of the refname (e.g.,
> -		 * "refs/foo"; if there is a reference with that name,
> -		 * it is a conflict, *unless* it is in skip.
> +		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
> +		 * There is no point in searching for a reference with that
> +		 * name, because a refname isn't considered to conflict with
> +		 * itself. But we still need to check for references whose
> +		 * names are in the "refs/foo/bar/" namespace, because they
> +		 * *do* conflict.
>  		 */
> -		if (skip && string_list_has_string(skip, dirname.buf))
> -			continue;
> +		strbuf_addstr(&dirname, refname + dirname.len);
> +		strbuf_addch(&dirname, '/');
> +
> +		if (!initial_transaction) {
> +			struct ref_iterator *iter;
> +			int ok;
> +
> +			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
> +						       DO_FOR_EACH_INCLUDE_BROKEN);
> +			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> +				if (skip &&
> +				    string_list_has_string(skip, iter->refname))
> +					continue;
> +
> +				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
> +					    iter->refname, refname);
> +				ref_iterator_abort(iter);
> +				goto cleanup;
> +			}
>  
> -		if (!initial_transaction &&
> -		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
> -				       &type, &ignore_errno)) {
> -			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
> -				    dirname.buf, refname);
> -			goto cleanup;
> +			if (ok != ITER_DONE)
> +				BUG("error while iterating over references");
>  		}
>  
> -		if (extras && string_list_has_string(extras, dirname.buf)) {
> +		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
> +		if (extra_refname) {
>  			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
> -				    refname, dirname.buf);
> +				    refname, extra_refname);
>  			goto cleanup;
>  		}
>  	}
>  
> -	/*
> -	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
> -	 * There is no point in searching for a reference with that
> -	 * name, because a refname isn't considered to conflict with
> -	 * itself. But we still need to check for references whose
> -	 * names are in the "refs/foo/bar/" namespace, because they
> -	 * *do* conflict.
> -	 */
> -	strbuf_addstr(&dirname, refname + dirname.len);
> -	strbuf_addch(&dirname, '/');
> -
> -	if (!initial_transaction) {
> -		struct ref_iterator *iter;
> -		int ok;
> -
> -		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
> -					       DO_FOR_EACH_INCLUDE_BROKEN);
> -		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> -			if (skip &&
> -			    string_list_has_string(skip, iter->refname))
> -				continue;
> -
> -			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
> -				    iter->refname, refname);
> -			ref_iterator_abort(iter);
> -			goto cleanup;
> -		}
> -
> -		if (ok != ITER_DONE)
> -			BUG("error while iterating over references");
> -	}
> -
> -	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
> -	if (extra_refname)
> -		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
> -			    refname, extra_refname);
> -	else
> -		ret = 0;
> +	ret = 0;
>  
>  cleanup:
>  	strbuf_release(&referent);
> @@ -2569,6 +2577,23 @@ int refs_verify_refname_available(struct ref_store *refs,
>  	return ret;
>  }
>  
> +int refs_verify_refname_available(struct ref_store *refs,
> +				  const char *refname,
> +				  const struct string_list *extras,
> +				  const struct string_list *skip,
> +				  unsigned int initial_transaction,
> +				  struct strbuf *err)
> +{
> +	struct string_list_item item = { .string = (char *) refname };
> +	struct string_list refnames = {
> +		.items = &item,
> +		.nr = 1,
> +	};
> +
> +	return refs_verify_refnames_available(refs, &refnames, extras, skip,
> +					      initial_transaction, err);
> +}
> +
>  struct do_for_each_reflog_help {
>  	each_reflog_fn *fn;
>  	void *cb_data;
> diff --git a/refs.h b/refs.h
> index a0cdd99250e..185aed5a461 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
>  				  unsigned int initial_transaction,
>  				  struct strbuf *err);
>  
> +/*
> + * Same as `refs_verify_refname_available()`, but checking for a list of
> + * refnames instead of only a single item. This is more efficient in the case
> + * where one needs to check multiple refnames.
> + */

Should we talk about more about why this is more efficient?

> +int refs_verify_refnames_available(struct ref_store *refs,
> +				   const struct string_list *refnames,
> +				   const struct string_list *extras,
> +				   const struct string_list *skip,
> +				   unsigned int initial_transaction,
> +				   struct strbuf *err);
> +
>  int refs_ref_exists(struct ref_store *refs, const char *refname);
>  
>  int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,
> 
> -- 
> 2.49.0.rc0.416.g627208d89d.dirty


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 04/16] refs: introduce function to batch refname availability checks
  2025-03-12 12:36     ` shejialuo
@ 2025-03-12 12:44       ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  1 sibling, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-03-12 12:44 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 08:36:47PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:35PM +0100, Patrick Steinhardt wrote:
> > The `refs_verify_refname_available()` functions checks whether a
> > reference update can be committed or whether it would conflict with
> > either a prefix or suffix thereof. This function needs to be called once
> > per reference that one wants to check, which requires us to redo a
> > couple of checks every time the function is called.
> > 
> > Introduce a new function `refs_verify_refnames_available()` that does
> > the same, but for a list of references. For now, the new function uses
> > the exact same implementation, except that we loop through all refnames
> > provided by the caller. This will be tuned in subsequent commits.
> > 
> 
> After reading this patch, I think we may could add more motivation here.
> What's the advantage of checking batch refname? From my understanding,
> we want to check the group of refnames and if we find one which is not
> good we will just return and we don't need to check further more thus
> avoiding unnecessary checks which improves speed.
> 
> When I read the commit message, I wonder if we will loop through all
> refnames, we still need to handle the same number of refnames. So how do
> we avoid this?
> 
> I think we should make this clear.
> 

I totally misunderstood. After reading the commit message of the next
patch, I have realized that we just simply do a refactor. Ignore my
comment. Sorry.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 05/16] refs/reftable: batch refname availability checks
  2025-03-06 15:08   ` [PATCH v5 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-03-12 12:54     ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-03-12 12:54 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:36PM +0100, Patrick Steinhardt wrote:
> Refactor the "reftable" backend to batch the availability check for
> refnames. This does not yet have an effect on performance as
> `refs_verify_refnames_available()` effectively still performs the
> availability check for each refname individually. But this will be
> optimized in subsequent commits, where we learn to optimize some parts
> of the logic when checking multiple refnames for availability.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/reftable-backend.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index d39a14c5a46..2a90e7cb391 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
>  	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
>  	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
>  	struct reftable_transaction_data *tx_data = NULL;
>  	struct reftable_backend *be;
>  	struct object_id head_oid;
> @@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  			 * can output a proper error message instead of failing
>  			 * at a later point.
>  			 */
> -			ret = refs_verify_refname_available(ref_store, u->refname,
> -							    &affected_refnames, NULL,
> -							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
> -							    err);
> -			if (ret < 0)
> -				goto done;
> +			string_list_append(&refnames_to_check, u->refname);
>  

Instead of checking in `prepare`, we will just append the refname to the
list.

>  			/*
>  			 * There is no need to write the reference deletion
> @@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  		}
>  	}
>  
> +	string_list_sort(&refnames_to_check);

I am curious why we need to sort the refnames here. I think at current,
we don't optimize the `refs_verify_refnames_available` function. No
matter whether the `refnames_to_check` is sorted, it should not
change the result of `refs_verify_refnames_available` function, right? I
guess this statement may be related to optimization part. If so, I think
we should delete this line and add in the later commit.

However, I am not sure.

> +	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
> +					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
> +					     err);
> +	if (ret < 0)
> +		goto done;
> +
>  	transaction->backend_data = tx_data;
>  	transaction->state = REF_TRANSACTION_PREPARED;
>  
> @@ -1394,6 +1397,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  	string_list_clear(&affected_refnames, 0);
>  	strbuf_release(&referent);
>  	strbuf_release(&head_referent);
> +	string_list_clear(&refnames_to_check, 0);
>  
>  	return ret;
>  }


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions
  2025-03-06 15:08   ` [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
@ 2025-03-12 12:58     ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-03-12 12:58 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:37PM +0100, Patrick Steinhardt wrote:
> @@ -2811,6 +2808,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
>  	size_t i;
>  	int ret = 0;
>  	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
>  	char *head_ref = NULL;
>  	int head_type;
>  	struct files_transaction_backend_data *backend_data;
> @@ -2898,7 +2896,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
>  		struct ref_update *update = transaction->updates[i];
>  
>  		ret = lock_ref_for_update(refs, update, transaction,
> -					  head_ref, &affected_refnames, err);
> +					  head_ref, &refnames_to_check,
> +					  &affected_refnames, err);
>  		if (ret)
>  			goto cleanup;
>  
> @@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
>  		}
>  	}
>  
> +	/*
> +	 * Verify that none of the loose reference that we're about to write
> +	 * conflict with any existing packed references. Ideally, we'd do this
> +	 * check after the packed-refs are locked so that the file cannot
> +	 * change underneath our feet. But introducing such a lock now would
> +	 * probably do more harm than good as users rely on there not being a
> +	 * global lock with the "files" backend.
> +	 *
> +	 * Another alternative would be to do the check after the (optional)
> +	 * lock, but that would extend the time we spend in the globally-locked
> +	 * state.
> +	 *
> +	 * So instead, we accept the race for now.
> +	 */

I am curious why we don't sort the `refnames_to_check` here. What is the
difference between the reftable backend and files backend?

> +	if (refs_verify_refnames_available(refs->packed_ref_store, &refnames_to_check,
> +					   &affected_refnames, NULL, 0, err)) {
> +		ret = TRANSACTION_NAME_CONFLICT;
> +		goto cleanup;
> +	}
> +
>  	if (packed_transaction) {
>  		if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
>  			ret = TRANSACTION_GENERIC_ERROR;


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions
  2025-03-06 15:08   ` [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-03-12 13:06     ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-03-12 13:06 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:38PM +0100, Patrick Steinhardt wrote:
> The "files" backend explicitly carves out special logic for its initial
> transaction so that it can avoid writing out every single reference as
> a loose reference. While the assumption is that there shouldn't be any
> preexisting references, we still have to verify that none of the newly
> written references will conflict with any other new reference in the
> same transaction.
> 
> Refactor the initial transaction to use batched refname availability
> checks. This does not yet have an effect on performance as we still call
> `refs_verify_refname_available()` in a loop. But this will change in
> subsequent commits and then impact performance when cloning a repository
> with many references or when migrating references to the "files" format.
> 
> This will improve performance when cloning a repository with many
> references or when migrating references from any format to the "files"
> format once the availability checks have learned to optimize checks for
> many references in a subsequent commit.

I guess you forgot to delete some sentences for the commit message. This
paragraph is a little redundant. In the second paragraph, I think we
have already talked about this.

> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/files-backend.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 6ce79cf0791..11a620ea11a 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  	size_t i;
>  	int ret = 0;
>  	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
>  	struct ref_transaction *packed_transaction = NULL;
>  	struct ref_transaction *loose_transaction = NULL;
>  
> @@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		    !is_null_oid(&update->old_oid))
>  			BUG("initial ref transaction with old_sha1 set");
>  
> -		if (refs_verify_refname_available(&refs->base, update->refname,
> -						  &affected_refnames, NULL, 1, err)) {
> -			ret = TRANSACTION_NAME_CONFLICT;
> -			goto cleanup;
> -		}
> +		string_list_append(&refnames_to_check, update->refname);
>  
>  		/*
>  		 * packed-refs don't support symbolic refs, root refs and reflogs,
> @@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		}
>  	}
>  
> -	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
> -	    ref_transaction_commit(packed_transaction, err)) {
> +	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
> +		ret = TRANSACTION_GENERIC_ERROR;
> +		goto cleanup;
> +	}
> +

Still the same question: why we don't sort the `refnames_to_check` here.

> +	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
> +					   &affected_refnames, NULL, 1, err)) {
> +		packed_refs_unlock(refs->packed_ref_store);
> +		ret = TRANSACTION_NAME_CONFLICT;
> +		goto cleanup;
> +	}
> +
> +	if (ref_transaction_commit(packed_transaction, err)) {
>  		ret = TRANSACTION_GENERIC_ERROR;
>  		goto cleanup;
>  	}
> @@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  		ref_transaction_free(packed_transaction);
>  	transaction->state = REF_TRANSACTION_CLOSED;
>  	string_list_clear(&affected_refnames, 0);
> +	string_list_clear(&refnames_to_check, 0);
>  	return ret;
>  }


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability
  2025-03-06 15:08   ` [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-03-12 13:22     ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-03-12 13:22 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:39PM +0100, Patrick Steinhardt wrote:
>  refs.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/refs.c b/refs.c
> index 5a9b0f2fa1e..eaf41421f50 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2476,6 +2476,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  {
>  	struct strbuf dirname = STRBUF_INIT;
>  	struct strbuf referent = STRBUF_INIT;
> +	struct strset dirnames;
>  	int ret = -1;
>  
>  	/*
> @@ -2485,6 +2486,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  
>  	assert(err);
>  
> +	strset_init(&dirnames);
> +
>  	for (size_t i = 0; i < refnames->nr; i++) {
>  		const char *refname = refnames->items[i].string;
>  		const char *extra_refname;
> @@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  			if (skip && string_list_has_string(skip, dirname.buf))
>  				continue;
>  
> +			/*
> +			 * If we've already seen the directory we don't need to
> +			 * process it again. Skip it to avoid checking checking
> +			 * common prefixes like "refs/heads/" repeatedly.
> +			 */
> +			if (!strset_add(&dirnames, dirname.buf))
> +				continue;
> +

Reading here, I think we should not sort the refnames for "reftable"
backend. Anyway, really a nice job for optimizing the speed.

>  			if (!initial_transaction &&
>  			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
>  					       &type, &ignore_errno)) {
> @@ -2574,6 +2585,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
>  cleanup:
>  	strbuf_release(&referent);
>  	strbuf_release(&dirname);
> +	strset_clear(&dirnames);
>  	return ret;
>  }
>  
> 
> -- 
> 2.49.0.rc0.416.g627208d89d.dirty
> 


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration
  2025-03-06 15:08   ` [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-03-12 13:45     ` shejialuo
  2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 1 reply; 163+ messages in thread
From: shejialuo @ 2025-03-12 13:45 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:40PM +0100, Patrick Steinhardt wrote:

> @@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  
>  	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
>  		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
> -
>  		if (cmp < 0)
>  			continue;
> -
> -		if (cmp > 0) {
> -			/*
> -			 * As the source iterator is ordered, we
> -			 * can stop the iteration as soon as we see a
> -			 * refname that comes after the prefix:
> -			 */
> -			ok = ref_iterator_abort(iter->iter0);
> -			break;
> -		}
> +		if (cmp > 0)
> +			return ITER_DONE;

Should we maintain the above comment? Why do we delete the comment. I
somehow think the comment makes sense.

>  
>  		if (iter->trim) {
>  			/*


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 00/16] refs: batch refname availability checks
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
                     ` (16 preceding siblings ...)
  2025-03-06 15:32   ` [PATCH v5 00/16] refs: batch refname availability checks Karthik Nayak
@ 2025-03-12 14:03   ` shejialuo
  17 siblings, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-03-12 14:03 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Thu, Mar 06, 2025 at 04:08:31PM +0100, Patrick Steinhardt wrote:

> Changes in v3:
>   - Fix one case where we didn't skip ambiguity checks in
>     git-update-ref(1).
>   - Document better that only the prefix can change on reseeking
>     iterators. Other internal state will remain the same.
>   - Fix a memory leak in the ref-cache iterator.
>   - Don't ignore errors returned by `packed_ref_iterator_seek()`.
>   - Link to v2: https://lore.kernel.org/r/20250219-pks-update-ref-optimization-v2-0-e696e7220b22@pks.im
> 
> Changes in v4:
>   - A couple of clarifications in the commit message that disabled
>     ambiguity warnings.
>   - Link to v3: https://lore.kernel.org/r/20250225-pks-update-ref-optimization-v3-0-77c3687cda75@pks.im
> 
> Changes in v5:
>   - Improve a couple of commit messages.
>   - Align `GET_OID_*` flag values.
>   - Link to v4: https://lore.kernel.org/r/20250228-pks-update-ref-optimization-v4-0-6425c04268b5@pks.im
> 
> Thanks!
> 
> Patrick

I don't look at the range-diff, but review again in this version. I
mainly carefully look at patch 1-8 which I haven't looked in the
previous versions and left some comments.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 04/16] refs: introduce function to batch refname availability checks
  2025-03-12 12:36     ` shejialuo
  2025-03-12 12:44       ` shejialuo
@ 2025-03-12 15:36       ` Patrick Steinhardt
  1 sibling, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:36 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 08:36:47PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:35PM +0100, Patrick Steinhardt wrote:
> > diff --git a/refs.c b/refs.c
> > index f4094a326a9..5a9b0f2fa1e 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2489,79 +2485,91 @@ int refs_verify_refname_available(struct ref_store *refs,
> >  
> >  	assert(err);
> >  
> > -	strbuf_grow(&dirname, strlen(refname) + 1);
> > -	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
> > -		/*
> > -		 * Just saying "Is a directory" when we e.g. can't
> > -		 * lock some multi-level ref isn't very informative,
> > -		 * the user won't be told *what* is a directory, so
> > -		 * let's not use strerror() below.
> > -		 */
> > -		int ignore_errno;
> > -		/* Expand dirname to the new prefix, not including the trailing slash: */
> > -		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
> > +	for (size_t i = 0; i < refnames->nr; i++) {
> 
> Nit: we may just use `for_each_string_list_item` instead of use the raw
> "for" loop.

Fair, can do.

> > diff --git a/refs.h b/refs.h
> > index a0cdd99250e..185aed5a461 100644
> > --- a/refs.h
> > +++ b/refs.h
> > @@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
> >  				  unsigned int initial_transaction,
> >  				  struct strbuf *err);
> >  
> > +/*
> > + * Same as `refs_verify_refname_available()`, but checking for a list of
> > + * refnames instead of only a single item. This is more efficient in the case
> > + * where one needs to check multiple refnames.
> > + */
> 
> Should we talk about more about why this is more efficient?

I don't think the caller needs to be aware of why specifically it is
faster. All they should care for is that it does the same than the other
function, but that it knows to optimize better.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 05/16] refs/reftable: batch refname availability checks
  2025-03-12 12:54     ` shejialuo
@ 2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:36 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 08:54:18PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:36PM +0100, Patrick Steinhardt wrote:
> > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> > index d39a14c5a46..2a90e7cb391 100644
> > --- a/refs/reftable-backend.c
> > +++ b/refs/reftable-backend.c
> > @@ -1379,6 +1375,13 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
> >  		}
> >  	}
> >  
> > +	string_list_sort(&refnames_to_check);
> 
> I am curious why we need to sort the refnames here. I think at current,
> we don't optimize the `refs_verify_refnames_available` function. No
> matter whether the `refnames_to_check` is sorted, it should not
> change the result of `refs_verify_refnames_available` function, right? I
> guess this statement may be related to optimization part. If so, I think
> we should delete this line and add in the later commit.
> 
> However, I am not sure.

You're right, sorting shouldn't be necessary. It was in a previous
version of my patch series, but now that it's not we can drop this.

We may at one point introduce an optimization that does depend on refs
being sorted. But until there is a need we should skip unnecessary work.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions
  2025-03-12 12:58     ` shejialuo
@ 2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:36 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 08:58:46PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:37PM +0100, Patrick Steinhardt wrote:
> > @@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
> >  		}
> >  	}
> >  
> > +	/*
> > +	 * Verify that none of the loose reference that we're about to write
> > +	 * conflict with any existing packed references. Ideally, we'd do this
> > +	 * check after the packed-refs are locked so that the file cannot
> > +	 * change underneath our feet. But introducing such a lock now would
> > +	 * probably do more harm than good as users rely on there not being a
> > +	 * global lock with the "files" backend.
> > +	 *
> > +	 * Another alternative would be to do the check after the (optional)
> > +	 * lock, but that would extend the time we spend in the globally-locked
> > +	 * state.
> > +	 *
> > +	 * So instead, we accept the race for now.
> > +	 */
> 
> I am curious why we don't sort the `refnames_to_check` here. What is the
> difference between the reftable backend and files backend?

We do sort because we use `string_list_insert()` here. But in fact, we
don't have to sort at all, so I'll stop sorting in the preceding commit
and convert this callsite here to use `string_list_append()` instead.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions
  2025-03-12 13:06     ` shejialuo
@ 2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:36 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 09:06:08PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:38PM +0100, Patrick Steinhardt wrote:
> > The "files" backend explicitly carves out special logic for its initial
> > transaction so that it can avoid writing out every single reference as
> > a loose reference. While the assumption is that there shouldn't be any
> > preexisting references, we still have to verify that none of the newly
> > written references will conflict with any other new reference in the
> > same transaction.
> > 
> > Refactor the initial transaction to use batched refname availability
> > checks. This does not yet have an effect on performance as we still call
> > `refs_verify_refname_available()` in a loop. But this will change in
> > subsequent commits and then impact performance when cloning a repository
> > with many references or when migrating references to the "files" format.
> > 
> > This will improve performance when cloning a repository with many
> > references or when migrating references from any format to the "files"
> > format once the availability checks have learned to optimize checks for
> > many references in a subsequent commit.
> 
> I guess you forgot to delete some sentences for the commit message. This
> paragraph is a little redundant. In the second paragraph, I think we
> have already talked about this.

Not quite: the second paragraph talks about how this does not yet have
an effect, but will have an effect once we have optimized this. The
third paragraph on the other hand talks about which specific parts will
benefit from the optimizations.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration
  2025-03-12 13:45     ` shejialuo
@ 2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:36 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 09:45:48PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:40PM +0100, Patrick Steinhardt wrote:
> 
> > @@ -350,19 +338,10 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  
> >  	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
> >  		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
> > -
> >  		if (cmp < 0)
> >  			continue;
> > -
> > -		if (cmp > 0) {
> > -			/*
> > -			 * As the source iterator is ordered, we
> > -			 * can stop the iteration as soon as we see a
> > -			 * refname that comes after the prefix:
> > -			 */
> > -			ok = ref_iterator_abort(iter->iter0);
> > -			break;
> > -		}
> > +		if (cmp > 0)
> > +			return ITER_DONE;
> 
> Should we maintain the above comment? Why do we delete the comment. I
> somehow think the comment makes sense.

Yeah, let's. I'll add it back in.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability
  2025-03-12 13:22     ` shejialuo
@ 2025-03-12 15:36       ` Patrick Steinhardt
  0 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:36 UTC (permalink / raw)
  To: shejialuo
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 09:22:59PM +0800, shejialuo wrote:
> On Thu, Mar 06, 2025 at 04:08:39PM +0100, Patrick Steinhardt wrote:
> > diff --git a/refs.c b/refs.c
> > index 5a9b0f2fa1e..eaf41421f50 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2514,6 +2517,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
> >  			if (skip && string_list_has_string(skip, dirname.buf))
> >  				continue;
> >  
> > +			/*
> > +			 * If we've already seen the directory we don't need to
> > +			 * process it again. Skip it to avoid checking checking
> > +			 * common prefixes like "refs/heads/" repeatedly.
> > +			 */
> > +			if (!strset_add(&dirnames, dirname.buf))
> > +				continue;
> > +
> 
> Reading here, I think we should not sort the refnames for "reftable"
> backend. Anyway, really a nice job for optimizing the speed.

Agreed, it's unnecessary for this optimization.

Patrick


^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v6 00/16] refs: batch refname availability checks
  2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
                   ` (18 preceding siblings ...)
  2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
@ 2025-03-12 15:56 ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
                     ` (16 more replies)
  19 siblings, 17 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Hi,

this patch series has been inspired by brian's report that the reftable
backend is significantly slower when writing many references compared to
the files backend. As explained in that thread, the underlying issue is
the design of tombstone references: when we first delete all references
in a repository and then recreate them, we still have all the tombstones
and thus we need to churn through all of them to figure out that they
have been deleted in the first place. The files backend does not have
this issue.

I consider the benchmark itself to be kind of broken, as it stems from
us deleting all refs and then recreating them. And if you pack refs in
between then the "reftable" backend outperforms the "files" backend.

But there are a couple of opportunities here anyway. While we cannot
make the underlying issue of tombstones being less efficient go away,
this has prompted me to have a deeper look at where we spend all the
time. There are three ideas in this series:

  - git-update-ref(1) performs ambiguity checks for any full-size object
    ID, which triggers a lot of reads. This is somewhat pointless though
    given that the manpage explicitly points out that the command is
    about object IDs, even though it does know to parse refs. But being
    part of plumbing, emitting the warning here does not make a ton of
    sense, and favoring object IDs over references in these cases is the
    obvious thing to do anyway.

  - For each ref "refs/heads/bar", we need to verify that neither
    "refs/heads" nor "refs" exists. This was repeated for every refname,
    but because most refnames use common prefixes this made us re-check
    a lot of prefixes. This is addressed by using a `strset` of already
    checked prefixes.

  - For each ref "refs/heads/bar", we need to verify that no ref
    "refs/heads/bar/*" exists. We always created a new ref iterator for
    this check, which requires us to discard all internal state and then
    recreate it. The reftable library has already been refactored though
    to have reseekable iterators, so we backfill this functionality to
    all the other iterators and then reuse the iterator.

With the (somewhat broken) benchmark we see a small speedup with the
"files" backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     234.4 ms ±   1.9 ms    [User: 75.6 ms, System: 157.2 ms]
      Range (min … max):   232.2 ms … 236.9 ms    10 runs

    Benchmark 2: update-ref (refformat = files, revision = HEAD)
      Time (mean ± σ):     184.2 ms ±   2.0 ms    [User: 62.8 ms, System: 119.9 ms]
      Range (min … max):   181.1 ms … 187.0 ms    10 runs

    Summary
      update-ref (refformat = files, revision = HEAD) ran
        1.27 ± 0.02 times faster than update-ref (refformat = files, revision = master)

And a huge speedup with the "reftable" backend:

    Benchmark 1: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.852 s ±  0.061 s    [User: 16.754 s, System: 0.059 s]
      Range (min … max):   16.785 s … 16.982 s    10 runs

    Benchmark 2: update-ref (refformat = reftable, revision = HEAD)
      Time (mean ± σ):      2.230 s ±  0.009 s    [User: 2.192 s, System: 0.029 s]
      Range (min … max):    2.215 s …  2.244 s    10 runs

    Summary
      update-ref (refformat = reftable, revision = HEAD) ran
        7.56 ± 0.04 times faster than update-ref (refformat = reftable, revision = master)

We're still not up to speed with the "files" backend, but considerably
better. Given that this is an extreme edge case and not reflective of
the general case I'm okay with this result for now.

But more importantly, this refactoring also has a positive effect when
updating references in a repository with preexisting refs, which I
consider to be the more realistic scenario. The following benchmark
creates 10k refs with 100k preexisting refs.

With the "files" backend we see a modest improvement:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     478.4 ms ±  11.9 ms    [User: 96.7 ms, System: 379.6 ms]
      Range (min … max):   465.4 ms … 496.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     388.5 ms ±  10.3 ms    [User: 52.0 ms, System: 333.8 ms]
      Range (min … max):   376.5 ms … 403.1 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.23 ± 0.04 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = master)

But with the "reftable" backend we see an almost 5x improvement, where
it's now ~15x faster than the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)
      Time (mean ± σ):     153.9 ms ±   2.0 ms    [User: 96.5 ms, System: 56.6 ms]
      Range (min … max):   150.5 ms … 158.4 ms    18 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      32.2 ms ±   1.2 ms    [User: 27.6 ms, System: 4.3 ms]
      Range (min … max):    29.8 ms …  38.6 ms    71 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        4.78 ± 0.19 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = master)

The series is structured as follows:

  - Patches 1 to 4 implement the logic to skip ambiguity checks in
    git-update-ref(1).

  - Patch 5 to 8 introduce batched checks.

  - Patch 9 deduplicates the ref prefix checks.

  - Patch 10 to 16 implement the infrastructure to reseek iterators.

  - Patch 17 starts to reuse iterators for nested ref checks.

Changes in v2:
  - Point out why we also have to touch up the `dir_iterator`.
  - Fix up the comment explaining `ITER_DONE`.
  - Fix up comments that show usage patterns of the ref and dir iterator
    interfaces.
  - Start batching availability checks in the "files" backend, as well.
  - Improve the commit message that drops the ambiguity check so that we
    also point to 25fba78d36b (cat-file: disable object/refname
    ambiguity check for batch mode, 2013-07-12).
  - Link to v1: https://lore.kernel.org/r/20250217-pks-update-ref-optimization-v1-0-a2b6d87a24af@pks.im

Changes in v3:
  - Fix one case where we didn't skip ambiguity checks in
    git-update-ref(1).
  - Document better that only the prefix can change on reseeking
    iterators. Other internal state will remain the same.
  - Fix a memory leak in the ref-cache iterator.
  - Don't ignore errors returned by `packed_ref_iterator_seek()`.
  - Link to v2: https://lore.kernel.org/r/20250219-pks-update-ref-optimization-v2-0-e696e7220b22@pks.im

Changes in v4:
  - A couple of clarifications in the commit message that disabled
    ambiguity warnings.
  - Link to v3: https://lore.kernel.org/r/20250225-pks-update-ref-optimization-v3-0-77c3687cda75@pks.im

Changes in v5:
  - Improve a couple of commit messages.
  - Align `GET_OID_*` flag values.
  - Link to v4: https://lore.kernel.org/r/20250228-pks-update-ref-optimization-v4-0-6425c04268b5@pks.im

Changes in v6:
  - Use `for_each_string_list()` instead of manually iterating through
    the string list.
  - Stop sorting refs passed to `refs_verify_refnames_available()`.
  - Revive a comment that has been deleted during one of the
    refactorings.
  - Link to v5: https://lore.kernel.org/r/20250306-pks-update-ref-optimization-v5-0-dcb2ee037e97@pks.im

Thanks!

Patrick

[1]: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net>

---
Patrick Steinhardt (16):
      object-name: introduce `repo_get_oid_with_flags()`
      object-name: allow skipping ambiguity checks in `get_oid()` family
      builtin/update-ref: skip ambiguity checks when parsing object IDs
      refs: introduce function to batch refname availability checks
      refs/reftable: batch refname availability checks
      refs/files: batch refname availability checks for normal transactions
      refs/files: batch refname availability checks for initial transactions
      refs: stop re-verifying common prefixes for availability
      refs/iterator: separate lifecycle from iteration
      refs/iterator: provide infrastructure to re-seek iterators
      refs/iterator: implement seeking for merged iterators
      refs/iterator: implement seeking for reftable iterators
      refs/iterator: implement seeking for ref-cache iterators
      refs/iterator: implement seeking for packed-ref iterators
      refs/iterator: implement seeking for files iterators
      refs: reuse iterators when determining refname availability

 builtin/clone.c              |   2 +
 builtin/update-ref.c         |  15 ++--
 dir-iterator.c               |  24 +++---
 dir-iterator.h               |  11 +--
 hash.h                       |  23 +++---
 iterator.h                   |   2 +-
 object-name.c                |  18 +++--
 object-name.h                |   6 ++
 refs.c                       | 187 ++++++++++++++++++++++++++-----------------
 refs.h                       |  12 +++
 refs/debug.c                 |  20 +++--
 refs/files-backend.c         | 117 +++++++++++++++++----------
 refs/iterator.c              | 150 ++++++++++++++++++----------------
 refs/packed-backend.c        |  92 ++++++++++++---------
 refs/ref-cache.c             |  88 ++++++++++++--------
 refs/refs-internal.h         |  53 +++++++-----
 refs/reftable-backend.c      |  84 ++++++++++---------
 t/helper/test-dir-iterator.c |   1 +
 18 files changed, 544 insertions(+), 361 deletions(-)

Range-diff versus v5:

 1:  d23737dfca2 =  1:  07ebc2f03d9 object-name: introduce `repo_get_oid_with_flags()`
 2:  33f4548d2c0 =  2:  b722efbe3dc object-name: allow skipping ambiguity checks in `get_oid()` family
 3:  ea0234047ab =  3:  a0171a61808 builtin/update-ref: skip ambiguity checks when parsing object IDs
 4:  3dfef6655f3 !  4:  91f0c75ac7b refs: introduce function to batch refname availability checks
    @@ refs.c: int ref_transaction_commit(struct ref_transaction *transaction,
      	struct strbuf referent = STRBUF_INIT;
     -	struct object_id oid;
     -	unsigned int type;
    ++	struct string_list_item *item;
      	int ret = -1;
      
      	/*
    @@ refs.c: int refs_verify_refname_available(struct ref_store *refs,
     -		int ignore_errno;
     -		/* Expand dirname to the new prefix, not including the trailing slash: */
     -		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
    -+	for (size_t i = 0; i < refnames->nr; i++) {
    -+		const char *refname = refnames->items[i].string;
    ++	for_each_string_list_item(item, refnames) {
    ++		const char *refname = item->string;
     +		const char *extra_refname;
     +		struct object_id oid;
     +		unsigned int type;
 5:  671ef35018c !  5:  8caf4f99ecd refs/reftable: batch refname availability checks
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
      		}
      	}
      
    -+	string_list_sort(&refnames_to_check);
     +	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
     +					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
     +					     err);
 6:  4ff7d2b821e !  6:  a10538db712 refs/files: batch refname availability checks for normal transactions
    @@ refs/files-backend.c: static int lock_raw_ref(struct files_ref_store *refs,
     -			ret = TRANSACTION_NAME_CONFLICT;
     -			goto error_return;
     -		}
    -+		string_list_insert(refnames_to_check, refname);
    ++		string_list_append(refnames_to_check, refname);
      	}
      
      	ret = 0;
 7:  901664612bd =  7:  dccee563e60 refs/files: batch refname availability checks for initial transactions
 8:  1ddc9372a77 !  8:  c7b2ce15afe refs: stop re-verifying common prefixes for availability
    @@ Commit message
     
      ## refs.c ##
     @@ refs.c: int refs_verify_refnames_available(struct ref_store *refs,
    - {
      	struct strbuf dirname = STRBUF_INIT;
      	struct strbuf referent = STRBUF_INIT;
    + 	struct string_list_item *item;
     +	struct strset dirnames;
      	int ret = -1;
      
    @@ refs.c: int refs_verify_refnames_available(struct ref_store *refs,
      
     +	strset_init(&dirnames);
     +
    - 	for (size_t i = 0; i < refnames->nr; i++) {
    - 		const char *refname = refnames->items[i].string;
    + 	for_each_string_list_item(item, refnames) {
    + 		const char *refname = item->string;
      		const char *extra_refname;
     @@ refs.c: int refs_verify_refnames_available(struct ref_store *refs,
      			if (skip && string_list_has_string(skip, dirname.buf))
 9:  b3cd0baed29 !  9:  22730c09f01 refs/iterator: separate lifecycle from iteration
    @@ iterator.h
     
      ## refs.c ##
     @@ refs.c: int refs_verify_refnames_available(struct ref_store *refs,
    - {
      	struct strbuf dirname = STRBUF_INIT;
      	struct strbuf referent = STRBUF_INIT;
    + 	struct string_list_item *item;
     +	struct ref_iterator *iter = NULL;
      	struct strset dirnames;
      	int ret = -1;
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
     -			ok = ref_iterator_abort(iter->iter0);
     -			break;
     -		}
    ++		/*
    ++		 * As the source iterator is ordered, we
    ++		 * can stop the iteration as soon as we see a
    ++		 * refname that comes after the prefix:
    ++		 */
     +		if (cmp > 0)
     +			return ITER_DONE;
      
10:  c1ceade2ba2 = 10:  e65b53f5b52 refs/iterator: provide infrastructure to re-seek iterators
11:  77642dfdade = 11:  66b8aa94cd7 refs/iterator: implement seeking for merged iterators
12:  fae0048ced3 = 12:  42192c4d4b6 refs/iterator: implement seeking for reftable iterators
13:  ff9384ee7b0 = 13:  76078359277 refs/iterator: implement seeking for ref-cache iterators
14:  105a01b3a59 = 14:  83d6f90c90a refs/iterator: implement seeking for packed-ref iterators
15:  1bf2e76b4c8 = 15:  b66bf0a9966 refs/iterator: implement seeking for files iterators
16:  24548c33e5c = 16:  67457a1c03f refs: reuse iterators when determining refname availability

---
base-commit: e2067b49ecaef9b7f51a17ce251f9207f72ef52d
change-id: 20250217-pks-update-ref-optimization-15c795e66e2b



^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v6 01/16] object-name: introduce `repo_get_oid_with_flags()`
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.

This function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c | 14 ++++++++------
 object-name.h |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/object-name.c b/object-name.c
index 945d5bdef25..233f3f861e3 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1794,18 +1794,20 @@ void object_context_release(struct object_context *ctx)
 	strbuf_release(&ctx->symlink_path);
 }
 
-/*
- * This is like "get_oid_basic()", except it allows "object ID expressions",
- * notably "xyz^" for "parent of xyz"
- */
-int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+int repo_get_oid_with_flags(struct repository *r, const char *name,
+			    struct object_id *oid, unsigned flags)
 {
 	struct object_context unused;
-	int ret = get_oid_with_context(r, name, 0, oid, &unused);
+	int ret = get_oid_with_context(r, name, flags, oid, &unused);
 	object_context_release(&unused);
 	return ret;
 }
 
+int repo_get_oid(struct repository *r, const char *name, struct object_id *oid)
+{
+	return repo_get_oid_with_flags(r, name, oid, 0);
+}
+
 /*
  * This returns a non-zero value if the string (built using printf
  * format and the given arguments) is not a valid object.
diff --git a/object-name.h b/object-name.h
index 8dba4a47a47..cda4934cd5f 100644
--- a/object-name.h
+++ b/object-name.h
@@ -51,6 +51,12 @@ void strbuf_repo_add_unique_abbrev(struct strbuf *sb, struct repository *repo,
 void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 			      int abbrev_len);
 
+/*
+ * This is like "get_oid_basic()", except it allows "object ID expressions",
+ * notably "xyz^" for "parent of xyz". Accepts GET_OID_* flags.
+ */
+int repo_get_oid_with_flags(struct repository *r, const char *str,
+			    struct object_id *oid, unsigned flags);
 int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
 __attribute__((format (printf, 2, 3)))
 int get_oidf(struct object_id *oid, const char *fmt, ...);

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.

Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 hash.h        | 23 ++++++++++++-----------
 object-name.c |  4 +++-
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/hash.h b/hash.h
index 4367acfec50..5e3c462dc5e 100644
--- a/hash.h
+++ b/hash.h
@@ -193,17 +193,18 @@ struct object_id {
 	int algo;	/* XXX requires 4-byte alignment */
 };
 
-#define GET_OID_QUIETLY           01
-#define GET_OID_COMMIT            02
-#define GET_OID_COMMITTISH        04
-#define GET_OID_TREE             010
-#define GET_OID_TREEISH          020
-#define GET_OID_BLOB             040
-#define GET_OID_FOLLOW_SYMLINKS 0100
-#define GET_OID_RECORD_PATH     0200
-#define GET_OID_ONLY_TO_DIE    04000
-#define GET_OID_REQUIRE_PATH  010000
-#define GET_OID_HASH_ANY      020000
+#define GET_OID_QUIETLY                  01
+#define GET_OID_COMMIT                   02
+#define GET_OID_COMMITTISH               04
+#define GET_OID_TREE                    010
+#define GET_OID_TREEISH                 020
+#define GET_OID_BLOB                    040
+#define GET_OID_FOLLOW_SYMLINKS        0100
+#define GET_OID_RECORD_PATH            0200
+#define GET_OID_ONLY_TO_DIE           04000
+#define GET_OID_REQUIRE_PATH         010000
+#define GET_OID_HASH_ANY             020000
+#define GET_OID_SKIP_AMBIGUITY_CHECK 040000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 233f3f861e3..85444dbb15b 100644
--- a/object-name.c
+++ b/object-name.c
@@ -961,7 +961,9 @@ static int get_oid_basic(struct repository *r, const char *str, int len,
 	int fatal = !(flags & GET_OID_QUIETLY);
 
 	if (len == r->hash_algo->hexsz && !get_oid_hex(str, oid)) {
-		if (repo_settings_get_warn_ambiguous_refs(r) && warn_on_object_refname_ambiguity) {
+		if (!(flags & GET_OID_SKIP_AMBIGUITY_CHECK) &&
+		    repo_settings_get_warn_ambiguous_refs(r) &&
+		    warn_on_object_refname_ambiguity) {
 			refs_found = repo_dwim_ref(r, str, len, &tmp_oid, &real_ref, 0);
 			if (refs_found > 0) {
 				warning(warn_msg, len, str);

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.

One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:

  - The manpage is explicitly structured around object IDs. So if we see
    a fully blown object ID, the intent should be quite clear in
    general.

  - The command is part of our plumbing layer and not a tool that users
    would generally use in interactive workflows. As such, the warning
    will likely not be visible to anybody in the first place.

  - Users can and should use the fully-qualified refname in case there
    is any potential for ambiguity. And given that this command is part
    of our plumbing layer, one should always try to be as defensive as
    possible and use fully-qualified refnames.

Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.

The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).

Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.

The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     467.3 ms ±   5.1 ms    [User: 100.0 ms, System: 365.1 ms]
      Range (min … max):   461.9 ms … 479.3 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     394.1 ms ±   5.8 ms    [User: 63.3 ms, System: 327.6 ms]
      Range (min … max):   384.9 ms … 405.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

And with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     146.9 ms ±   2.2 ms    [User: 90.4 ms, System: 56.0 ms]
      Range (min … max):   142.7 ms … 150.8 ms    19 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      63.2 ms ±   1.1 ms    [User: 41.0 ms, System: 21.8 ms]
      Range (min … max):    61.1 ms …  66.6 ms    41 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/update-ref.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4d35bdc4b4b..1d541e13ade 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -179,7 +179,8 @@ static int parse_next_oid(const char **next, const char *end,
 		(*next)++;
 		*next = parse_arg(*next, &arg);
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else {
 			/* Without -z, an empty value means all zeros: */
@@ -197,7 +198,8 @@ static int parse_next_oid(const char **next, const char *end,
 		*next += arg.len;
 
 		if (arg.len) {
-			if (repo_get_oid(the_repository, arg.buf, oid))
+			if (repo_get_oid_with_flags(the_repository, arg.buf, oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				goto invalid;
 		} else if (flags & PARSE_SHA1_ALLOW_EMPTY) {
 			/* With -z, treat an empty value as all zeros: */
@@ -299,7 +301,8 @@ static void parse_cmd_symref_update(struct ref_transaction *transaction,
 			die("symref-update %s: expected old value", refname);
 
 		if (!strcmp(old_arg, "oid")) {
-			if (repo_get_oid(the_repository, old_target, &old_oid))
+			if (repo_get_oid_with_flags(the_repository, old_target, &old_oid,
+						    GET_OID_SKIP_AMBIGUITY_CHECK))
 				die("symref-update %s: invalid oid: %s", refname, old_target);
 
 			have_old_oid = 1;
@@ -772,7 +775,8 @@ int cmd_update_ref(int argc,
 		refname = argv[0];
 		value = argv[1];
 		oldval = argv[2];
-		if (repo_get_oid(the_repository, value, &oid))
+		if (repo_get_oid_with_flags(the_repository, value, &oid,
+					    GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid SHA1", value);
 	}
 
@@ -783,7 +787,8 @@ int cmd_update_ref(int argc,
 			 * must not already exist:
 			 */
 			oidclr(&oldoid, the_repository->hash_algo);
-		else if (repo_get_oid(the_repository, oldval, &oldoid))
+		else if (repo_get_oid_with_flags(the_repository, oldval, &oldoid,
+						 GET_OID_SKIP_AMBIGUITY_CHECK))
 			die("%s: not a valid old SHA1", oldval);
 	}
 

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 04/16] refs: introduce function to batch refname availability checks
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 05/16] refs/reftable: " Patrick Steinhardt
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.

Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.

The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 170 +++++++++++++++++++++++++++++++++++++----------------------------
 refs.h |  12 +++++
 2 files changed, 110 insertions(+), 72 deletions(-)

diff --git a/refs.c b/refs.c
index f4094a326a9..d91a2184e06 100644
--- a/refs.c
+++ b/refs.c
@@ -2467,19 +2467,16 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 	return ret;
 }
 
-int refs_verify_refname_available(struct ref_store *refs,
-				  const char *refname,
-				  const struct string_list *extras,
-				  const struct string_list *skip,
-				  unsigned int initial_transaction,
-				  struct strbuf *err)
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err)
 {
-	const char *slash;
-	const char *extra_refname;
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
-	struct object_id oid;
-	unsigned int type;
+	struct string_list_item *item;
 	int ret = -1;
 
 	/*
@@ -2489,79 +2486,91 @@ int refs_verify_refname_available(struct ref_store *refs,
 
 	assert(err);
 
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/*
-		 * Just saying "Is a directory" when we e.g. can't
-		 * lock some multi-level ref isn't very informative,
-		 * the user won't be told *what* is a directory, so
-		 * let's not use strerror() below.
-		 */
-		int ignore_errno;
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+	for_each_string_list_item(item, refnames) {
+		const char *refname = item->string;
+		const char *extra_refname;
+		struct object_id oid;
+		unsigned int type;
+		const char *slash;
+
+		strbuf_reset(&dirname);
+
+		for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+			/*
+			 * Just saying "Is a directory" when we e.g. can't
+			 * lock some multi-level ref isn't very informative,
+			 * the user won't be told *what* is a directory, so
+			 * let's not use strerror() below.
+			 */
+			int ignore_errno;
+
+			/* Expand dirname to the new prefix, not including the trailing slash: */
+			strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+			/*
+			 * We are still at a leading dir of the refname (e.g.,
+			 * "refs/foo"; if there is a reference with that name,
+			 * it is a conflict, *unless* it is in skip.
+			 */
+			if (skip && string_list_has_string(skip, dirname.buf))
+				continue;
+
+			if (!initial_transaction &&
+			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
+					       &type, &ignore_errno)) {
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    dirname.buf, refname);
+				goto cleanup;
+			}
+
+			if (extras && string_list_has_string(extras, dirname.buf)) {
+				strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
+					    refname, dirname.buf);
+				goto cleanup;
+			}
+		}
 
 		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
+		 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+		 * There is no point in searching for a reference with that
+		 * name, because a refname isn't considered to conflict with
+		 * itself. But we still need to check for references whose
+		 * names are in the "refs/foo/bar/" namespace, because they
+		 * *do* conflict.
 		 */
-		if (skip && string_list_has_string(skip, dirname.buf))
-			continue;
+		strbuf_addstr(&dirname, refname + dirname.len);
+		strbuf_addch(&dirname, '/');
+
+		if (!initial_transaction) {
+			struct ref_iterator *iter;
+			int ok;
+
+			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+						       DO_FOR_EACH_INCLUDE_BROKEN);
+			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+				if (skip &&
+				    string_list_has_string(skip, iter->refname))
+					continue;
+
+				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
+					    iter->refname, refname);
+				ref_iterator_abort(iter);
+				goto cleanup;
+			}
 
-		if (!initial_transaction &&
-		    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
-				       &type, &ignore_errno)) {
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    dirname.buf, refname);
-			goto cleanup;
+			if (ok != ITER_DONE)
+				BUG("error while iterating over references");
 		}
 
-		if (extras && string_list_has_string(extras, dirname.buf)) {
+		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+		if (extra_refname) {
 			strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-				    refname, dirname.buf);
+				    refname, extra_refname);
 			goto cleanup;
 		}
 	}
 
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (!initial_transaction) {
-		struct ref_iterator *iter;
-		int ok;
-
-		iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-					       DO_FOR_EACH_INCLUDE_BROKEN);
-		while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-			if (skip &&
-			    string_list_has_string(skip, iter->refname))
-				continue;
-
-			strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
-				    iter->refname, refname);
-			ref_iterator_abort(iter);
-			goto cleanup;
-		}
-
-		if (ok != ITER_DONE)
-			BUG("error while iterating over references");
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
-			    refname, extra_refname);
-	else
-		ret = 0;
+	ret = 0;
 
 cleanup:
 	strbuf_release(&referent);
@@ -2569,6 +2578,23 @@ int refs_verify_refname_available(struct ref_store *refs,
 	return ret;
 }
 
+int refs_verify_refname_available(struct ref_store *refs,
+				  const char *refname,
+				  const struct string_list *extras,
+				  const struct string_list *skip,
+				  unsigned int initial_transaction,
+				  struct strbuf *err)
+{
+	struct string_list_item item = { .string = (char *) refname };
+	struct string_list refnames = {
+		.items = &item,
+		.nr = 1,
+	};
+
+	return refs_verify_refnames_available(refs, &refnames, extras, skip,
+					      initial_transaction, err);
+}
+
 struct do_for_each_reflog_help {
 	each_reflog_fn *fn;
 	void *cb_data;
diff --git a/refs.h b/refs.h
index a0cdd99250e..185aed5a461 100644
--- a/refs.h
+++ b/refs.h
@@ -124,6 +124,18 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  unsigned int initial_transaction,
 				  struct strbuf *err);
 
+/*
+ * Same as `refs_verify_refname_available()`, but checking for a list of
+ * refnames instead of only a single item. This is more efficient in the case
+ * where one needs to check multiple refnames.
+ */
+int refs_verify_refnames_available(struct ref_store *refs,
+				   const struct string_list *refnames,
+				   const struct string_list *extras,
+				   const struct string_list *skip,
+				   unsigned int initial_transaction,
+				   struct strbuf *err);
+
 int refs_ref_exists(struct ref_store *refs, const char *refname);
 
 int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 05/16] refs/reftable: batch refname availability checks
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as
`refs_verify_refnames_available()` effectively still performs the
availability check for each refname individually. But this will be
optimized in subsequent commits, where we learn to optimize some parts
of the logic when checking multiple refnames for availability.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index d39a14c5a46..5c464b9d143 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1069,6 +1069,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
 	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct reftable_transaction_data *tx_data = NULL;
 	struct reftable_backend *be;
 	struct object_id head_oid;
@@ -1224,12 +1225,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 			 * can output a proper error message instead of failing
 			 * at a later point.
 			 */
-			ret = refs_verify_refname_available(ref_store, u->refname,
-							    &affected_refnames, NULL,
-							    transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
-							    err);
-			if (ret < 0)
-				goto done;
+			string_list_append(&refnames_to_check, u->refname);
 
 			/*
 			 * There is no need to write the reference deletion
@@ -1379,6 +1375,12 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	ret = refs_verify_refnames_available(ref_store, &refnames_to_check, &affected_refnames, NULL,
+					     transaction->flags & REF_TRANSACTION_FLAG_INITIAL,
+					     err);
+	if (ret < 0)
+		goto done;
+
 	transaction->backend_data = tx_data;
 	transaction->state = REF_TRANSACTION_PREPARED;
 
@@ -1394,6 +1396,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 	string_list_clear(&affected_refnames, 0);
 	strbuf_release(&referent);
 	strbuf_release(&head_referent);
+	string_list_clear(&refnames_to_check, 0);
 
 	return ret;
 }

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 06/16] refs/files: batch refname availability checks for normal transactions
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 05/16] refs/reftable: " Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:

  1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
     a new reference, mostly to create a nice, user-readable error
     message. This is nothing we have to care about too much, as we only
     hit this code path at most once when we hit a conflict.

  2. `lock_raw_ref()` when it _could_ create the lockfile to check
     whether it is conflicting with any packed refs. In the general case,
     this code path will be hit once for every (successful) reference
     update.

  3. `lock_ref_oid_basic()`, but it is only executed when copying or
     renaming references or when expiring reflogs. It will thus not be
     called in contexts where we have many references queued up.

  4. `refs_refname_ref_available()`, but again only when copying or
     renaming references. It is thus not interesting due to the same
     reason as the previous case.

  5. `files_transaction_finish_initial()`, which is only executed when
     creating a new repository or migrating references.

So out of these, only (2) and (5) are viable candidates to use the
batched checks.

Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.

The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 29f08dced40..f798d8dae37 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -678,6 +678,7 @@ static void unlock_ref(struct ref_lock *lock)
  */
 static int lock_raw_ref(struct files_ref_store *refs,
 			const char *refname, int mustexist,
+			struct string_list *refnames_to_check,
 			const struct string_list *extras,
 			struct ref_lock **lock_p,
 			struct strbuf *referent,
@@ -855,16 +856,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 		}
 
 		/*
-		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing packed ref that
-		 * conflicts with refname:
+		 * If the ref did not exist and we are creating it, we have to
+		 * make sure there is no existing packed ref that conflicts
+		 * with refname. This check is deferred so that we can batch it.
 		 */
-		if (refs_verify_refname_available(
-				    refs->packed_ref_store, refname,
-				    extras, NULL, 0, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto error_return;
-		}
+		string_list_append(refnames_to_check, refname);
 	}
 
 	ret = 0;
@@ -2569,6 +2565,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 			       struct ref_update *update,
 			       struct ref_transaction *transaction,
 			       const char *head_ref,
+			       struct string_list *refnames_to_check,
 			       struct string_list *affected_refnames,
 			       struct strbuf *err)
 {
@@ -2597,7 +2594,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 		lock->count++;
 	} else {
 		ret = lock_raw_ref(refs, update->refname, mustexist,
-				   affected_refnames,
+				   refnames_to_check, affected_refnames,
 				   &lock, &referent,
 				   &update->type, err);
 		if (ret) {
@@ -2811,6 +2808,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	char *head_ref = NULL;
 	int head_type;
 	struct files_transaction_backend_data *backend_data;
@@ -2898,7 +2896,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		struct ref_update *update = transaction->updates[i];
 
 		ret = lock_ref_for_update(refs, update, transaction,
-					  head_ref, &affected_refnames, err);
+					  head_ref, &refnames_to_check,
+					  &affected_refnames, err);
 		if (ret)
 			goto cleanup;
 
@@ -2930,6 +2929,26 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 		}
 	}
 
+	/*
+	 * Verify that none of the loose reference that we're about to write
+	 * conflict with any existing packed references. Ideally, we'd do this
+	 * check after the packed-refs are locked so that the file cannot
+	 * change underneath our feet. But introducing such a lock now would
+	 * probably do more harm than good as users rely on there not being a
+	 * global lock with the "files" backend.
+	 *
+	 * Another alternative would be to do the check after the (optional)
+	 * lock, but that would extend the time we spend in the globally-locked
+	 * state.
+	 *
+	 * So instead, we accept the race for now.
+	 */
+	if (refs_verify_refnames_available(refs->packed_ref_store, &refnames_to_check,
+					   &affected_refnames, NULL, 0, err)) {
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
 	if (packed_transaction) {
 		if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
@@ -2972,6 +2991,7 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 cleanup:
 	free(head_ref);
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 
 	if (ret)
 		files_transaction_cleanup(refs, transaction);

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 07/16] refs/files: batch refname availability checks for initial transactions
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.

Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.

This will improve performance when cloning a repository with many
references or when migrating references from any format to the "files"
format once the availability checks have learned to optimize checks for
many references in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f798d8dae37..ab6f0af5502 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3056,6 +3056,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 	size_t i;
 	int ret = 0;
 	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct string_list refnames_to_check = STRING_LIST_INIT_NODUP;
 	struct ref_transaction *packed_transaction = NULL;
 	struct ref_transaction *loose_transaction = NULL;
 
@@ -3105,11 +3106,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		    !is_null_oid(&update->old_oid))
 			BUG("initial ref transaction with old_sha1 set");
 
-		if (refs_verify_refname_available(&refs->base, update->refname,
-						  &affected_refnames, NULL, 1, err)) {
-			ret = TRANSACTION_NAME_CONFLICT;
-			goto cleanup;
-		}
+		string_list_append(&refnames_to_check, update->refname);
 
 		/*
 		 * packed-refs don't support symbolic refs, root refs and reflogs,
@@ -3145,8 +3142,19 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		}
 	}
 
-	if (packed_refs_lock(refs->packed_ref_store, 0, err) ||
-	    ref_transaction_commit(packed_transaction, err)) {
+	if (packed_refs_lock(refs->packed_ref_store, 0, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+	if (refs_verify_refnames_available(&refs->base, &refnames_to_check,
+					   &affected_refnames, NULL, 1, err)) {
+		packed_refs_unlock(refs->packed_ref_store);
+		ret = TRANSACTION_NAME_CONFLICT;
+		goto cleanup;
+	}
+
+	if (ref_transaction_commit(packed_transaction, err)) {
 		ret = TRANSACTION_GENERIC_ERROR;
 		goto cleanup;
 	}
@@ -3167,6 +3175,7 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
 		ref_transaction_free(packed_transaction);
 	transaction->state = REF_TRANSACTION_CLOSED;
 	string_list_clear(&affected_refnames, 0);
+	string_list_clear(&refnames_to_check, 0);
 	return ret;
 }
 

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 08/16] refs: stop re-verifying common prefixes for availability
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.

When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.

Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
      Range (min … max):    60.6 ms …  69.5 ms    38 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
      Range (min … max):    38.1 ms …  47.3 ms    61 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

For the "files" backend we see an improvement, but a much smaller one:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     395.8 ms ±   5.3 ms    [User: 63.6 ms, System: 330.5 ms]
      Range (min … max):   387.0 ms … 404.6 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     386.0 ms ±   4.0 ms    [User: 51.5 ms, System: 332.8 ms]
      Range (min … max):   380.8 ms … 392.6 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     836.6 ms ±   5.6 ms    [User: 645.2 ms, System: 185.2 ms]
      Range (min … max):   829.6 ms … 845.9 ms    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     759.8 ms ±   5.1 ms    [User: 574.9 ms, System: 178.9 ms]
      Range (min … max):   753.1 ms … 768.8 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = HEAD) ran
        1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

And vice versa:

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
      Time (mean ± σ):     870.7 ms ±   5.7 ms    [User: 735.2 ms, System: 127.4 ms]
      Range (min … max):   861.6 ms … 883.2 ms    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
      Time (mean ± σ):     799.1 ms ±   8.5 ms    [User: 661.1 ms, System: 130.2 ms]
      Range (min … max):   787.5 ms … 812.6 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = HEAD) ran
        1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)

The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/refs.c b/refs.c
index d91a2184e06..111e3cf3aa9 100644
--- a/refs.c
+++ b/refs.c
@@ -2477,6 +2477,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct string_list_item *item;
+	struct strset dirnames;
 	int ret = -1;
 
 	/*
@@ -2486,6 +2487,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 	assert(err);
 
+	strset_init(&dirnames);
+
 	for_each_string_list_item(item, refnames) {
 		const char *refname = item->string;
 		const char *extra_refname;
@@ -2515,6 +2518,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 			if (skip && string_list_has_string(skip, dirname.buf))
 				continue;
 
+			/*
+			 * If we've already seen the directory we don't need to
+			 * process it again. Skip it to avoid checking checking
+			 * common prefixes like "refs/heads/" repeatedly.
+			 */
+			if (!strset_add(&dirnames, dirname.buf))
+				continue;
+
 			if (!initial_transaction &&
 			    !refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
 					       &type, &ignore_errno)) {
@@ -2575,6 +2586,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 cleanup:
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
+	strset_clear(&dirnames);
 	return ret;
 }
 

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 09/16] refs/iterator: separate lifecycle from iteration
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.

This lifecycle is somewhat unusual in the Git codebase and creates two
problems:

  - Callsites need to be very careful about when exactly they call
    `ref_iterator_abort()`, as calling the function is only valid when
    the iterator itself still is. This leads to somewhat awkward calling
    patterns in some situations.

  - It is impossible to reuse iterators and re-seek them to a different
    prefix. This feature isn't supported by any iterator implementation
    except for the reftable iterators anyway, but if it was implemented
    it would allow us to optimize cases where we need to search for
    specific references repeatedly by reusing internal state.

Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.

Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.

While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c              |   2 +
 dir-iterator.c               |  24 +++++------
 dir-iterator.h               |  11 ++---
 iterator.h                   |   2 +-
 refs.c                       |   7 ++-
 refs/debug.c                 |   9 ++--
 refs/files-backend.c         |  36 +++++-----------
 refs/iterator.c              | 100 +++++++++++++++----------------------------
 refs/packed-backend.c        |  27 ++++++------
 refs/ref-cache.c             |   9 ++--
 refs/refs-internal.h         |  29 ++++---------
 refs/reftable-backend.c      |  34 ++++-----------
 t/helper/test-dir-iterator.c |   1 +
 13 files changed, 105 insertions(+), 186 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd001d800c6..ac3e84b2b18 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -426,6 +426,8 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		strbuf_setlen(src, src_len);
 		die(_("failed to iterate over '%s'"), src->buf);
 	}
+
+	dir_iterator_free(iter);
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
diff --git a/dir-iterator.c b/dir-iterator.c
index de619846f29..857e1d9bdaf 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -193,9 +193,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 	if (S_ISDIR(iter->base.st.st_mode) && push_level(iter)) {
 		if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-			goto error_out;
+			return ITER_ERROR;
 		if (iter->levels_nr == 0)
-			goto error_out;
+			return ITER_ERROR;
 	}
 
 	/* Loop until we find an entry that we can give back to the caller. */
@@ -211,11 +211,11 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
 			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
-					goto error_out;
+					return ITER_ERROR;
 				continue;
 			} else if (ret > 0) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -223,7 +223,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		} else {
 			if (level->entries_idx >= level->entries.nr) {
 				if (pop_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
+					return ITER_DONE;
 				continue;
 			}
 
@@ -232,22 +232,21 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 
 		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
-				goto error_out;
+				return ITER_ERROR;
 			continue;
 		}
 
 		return ITER_OK;
 	}
-
-error_out:
-	dir_iterator_abort(dir_iterator);
-	return ITER_ERROR;
 }
 
-int dir_iterator_abort(struct dir_iterator *dir_iterator)
+void dir_iterator_free(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
 
+	if (!iter)
+		return;
+
 	for (; iter->levels_nr; iter->levels_nr--) {
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -266,7 +265,6 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	free(iter->levels);
 	strbuf_release(&iter->base.path);
 	free(iter);
-	return ITER_DONE;
 }
 
 struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
@@ -301,7 +299,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags)
 	return dir_iterator;
 
 error_out:
-	dir_iterator_abort(dir_iterator);
+	dir_iterator_free(dir_iterator);
 	errno = saved_errno;
 	return NULL;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 6d438809b6e..ccd6a197343 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -28,7 +28,7 @@
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = dir_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -39,6 +39,7 @@
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     dir_iterator_free(iter);
  *
  * Callers are allowed to modify iter->path while they are working,
  * but they must restore it to its original contents before calling
@@ -107,11 +108,7 @@ struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags);
  */
 int dir_iterator_advance(struct dir_iterator *iterator);
 
-/*
- * End the iteration before it has been exhausted. Free the
- * dir_iterator and any associated resources and return ITER_DONE. On
- * error, free the dir_iterator and return ITER_ERROR.
- */
-int dir_iterator_abort(struct dir_iterator *iterator);
+/* Free the dir_iterator and any associated resources. */
+void dir_iterator_free(struct dir_iterator *iterator);
 
 #endif
diff --git a/iterator.h b/iterator.h
index 0f6900e43ad..6b77dcc2626 100644
--- a/iterator.h
+++ b/iterator.h
@@ -12,7 +12,7 @@
 #define ITER_OK 0
 
 /*
- * The iterator is exhausted and has been freed.
+ * The iterator is exhausted.
  */
 #define ITER_DONE -1
 
diff --git a/refs.c b/refs.c
index 111e3cf3aa9..3e65ccad7ac 100644
--- a/refs.c
+++ b/refs.c
@@ -2477,6 +2477,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	struct strbuf dirname = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct string_list_item *item;
+	struct ref_iterator *iter = NULL;
 	struct strset dirnames;
 	int ret = -1;
 
@@ -2553,7 +2554,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		strbuf_addch(&dirname, '/');
 
 		if (!initial_transaction) {
-			struct ref_iterator *iter;
 			int ok;
 
 			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
@@ -2565,12 +2565,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 				strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
 					    iter->refname, refname);
-				ref_iterator_abort(iter);
 				goto cleanup;
 			}
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
+
+			ref_iterator_free(iter);
+			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);
@@ -2587,6 +2589,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
 	strbuf_release(&referent);
 	strbuf_release(&dirname);
 	strset_clear(&dirnames);
+	ref_iterator_free(iter);
 	return ret;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
index fbc4df08b43..a9786da4ba1 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -179,19 +179,18 @@ static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return res;
 }
 
-static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct debug_ref_iterator *diter =
 		(struct debug_ref_iterator *)ref_iterator;
-	int res = diter->iter->vtable->abort(diter->iter);
-	trace_printf_key(&trace_refs, "iterator_abort: %d\n", res);
-	return res;
+	diter->iter->vtable->release(diter->iter);
+	trace_printf_key(&trace_refs, "iterator_abort\n");
 }
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
 	.peel = debug_ref_iterator_peel,
-	.abort = debug_ref_iterator_abort,
+	.release = debug_ref_iterator_release,
 };
 
 static struct ref_iterator *
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ab6f0af5502..e97a267ad65 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -915,10 +915,6 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -931,23 +927,17 @@ static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_ref_iterator *iter =
 		(struct files_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
 }
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
 	.peel = files_ref_iterator_peel,
-	.abort = files_ref_iterator_abort,
+	.release = files_ref_iterator_release,
 };
 
 static struct ref_iterator *files_ref_iterator_begin(
@@ -1378,7 +1368,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 				    iter->flags, opts))
 			refcount++;
 		if (refcount >= limit) {
-			ref_iterator_abort(iter);
+			ref_iterator_free(iter);
 			return 1;
 		}
 	}
@@ -1386,6 +1376,7 @@ static int should_pack_refs(struct files_ref_store *refs,
 	if (ret != ITER_DONE)
 		die("error while iterating over references");
 
+	ref_iterator_free(iter);
 	return 0;
 }
 
@@ -1452,6 +1443,7 @@ static int files_pack_refs(struct ref_store *ref_store,
 	packed_refs_unlock(refs->packed_ref_store);
 
 	prune_refs(refs, &refs_to_prune);
+	ref_iterator_free(iter);
 	strbuf_release(&err);
 	return 0;
 }
@@ -2299,9 +2291,6 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->dir_iterator = NULL;
-	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
-		ok = ITER_ERROR;
 	return ok;
 }
 
@@ -2311,23 +2300,17 @@ static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("ref_iterator_peel() called for reflog_iterator");
 }
 
-static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct files_reflog_iterator *iter =
 		(struct files_reflog_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->dir_iterator)
-		ok = dir_iterator_abort(iter->dir_iterator);
-
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	dir_iterator_free(iter->dir_iterator);
 }
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
 	.peel = files_reflog_iterator_peel,
-	.abort = files_reflog_iterator_abort,
+	.release = files_reflog_iterator_release,
 };
 
 static struct ref_iterator *reflog_iterator_begin(struct ref_store *ref_store,
@@ -3837,6 +3820,7 @@ static int files_fsck_refs_dir(struct ref_store *ref_store,
 		ret = error(_("failed to iterate over '%s'"), sb.buf);
 
 out:
+	dir_iterator_free(iter);
 	strbuf_release(&sb);
 	strbuf_release(&refname);
 	return ret;
diff --git a/refs/iterator.c b/refs/iterator.c
index d25e568bf0b..d61474cba75 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -21,9 +21,14 @@ int ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator->vtable->peel(ref_iterator, peeled);
 }
 
-int ref_iterator_abort(struct ref_iterator *ref_iterator)
+void ref_iterator_free(struct ref_iterator *ref_iterator)
 {
-	return ref_iterator->vtable->abort(ref_iterator);
+	if (ref_iterator) {
+		ref_iterator->vtable->release(ref_iterator);
+		/* Help make use-after-free bugs fail quickly: */
+		ref_iterator->vtable = NULL;
+		free(ref_iterator);
+	}
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
@@ -36,20 +41,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
 	iter->flags = 0;
 }
 
-void base_ref_iterator_free(struct ref_iterator *iter)
-{
-	/* Help make use-after-free bugs fail quickly: */
-	iter->vtable = NULL;
-	free(iter);
-}
-
 struct empty_ref_iterator {
 	struct ref_iterator base;
 };
 
-static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 {
-	return ref_iterator_abort(ref_iterator);
+	return ITER_DONE;
 }
 
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
@@ -58,16 +56,14 @@ static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 	BUG("peel called for empty iterator");
 }
 
-static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 {
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
 	.peel = empty_ref_iterator_peel,
-	.abort = empty_ref_iterator_abort,
+	.release = empty_ref_iterator_release,
 };
 
 struct ref_iterator *empty_ref_iterator_begin(void)
@@ -151,11 +147,13 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -166,6 +164,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -179,9 +178,8 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 			iter->select(iter->iter0, iter->iter1, iter->cb_data);
 
 		if (selection == ITER_SELECT_DONE) {
-			return ref_iterator_abort(ref_iterator);
+			return ITER_DONE;
 		} else if (selection == ITER_SELECT_ERROR) {
-			ref_iterator_abort(ref_iterator);
 			return ITER_ERROR;
 		}
 
@@ -195,6 +193,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -211,7 +210,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 
 error:
-	ref_iterator_abort(ref_iterator);
 	return ITER_ERROR;
 }
 
@@ -227,28 +225,18 @@ static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(*iter->current, peeled);
 }
 
-static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0) {
-		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	if (iter->iter1) {
-		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
-			ok = ITER_ERROR;
-	}
-	base_ref_iterator_free(ref_iterator);
-	return ok;
+	ref_iterator_free(iter->iter0);
+	ref_iterator_free(iter->iter1);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
 	.peel = merge_ref_iterator_peel,
-	.abort = merge_ref_iterator_abort,
+	.release = merge_ref_iterator_release,
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
@@ -310,10 +298,10 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	 * them.
 	 */
 	if (is_empty_ref_iterator(front)) {
-		ref_iterator_abort(front);
+		ref_iterator_free(front);
 		return back;
 	} else if (is_empty_ref_iterator(back)) {
-		ref_iterator_abort(back);
+		ref_iterator_free(back);
 		return front;
 	}
 
@@ -350,19 +338,15 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
 		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
-
 		if (cmp < 0)
 			continue;
-
-		if (cmp > 0) {
-			/*
-			 * As the source iterator is ordered, we
-			 * can stop the iteration as soon as we see a
-			 * refname that comes after the prefix:
-			 */
-			ok = ref_iterator_abort(iter->iter0);
-			break;
-		}
+		/*
+		 * As the source iterator is ordered, we
+		 * can stop the iteration as soon as we see a
+		 * refname that comes after the prefix:
+		 */
+		if (cmp > 0)
+			return ITER_DONE;
 
 		if (iter->trim) {
 			/*
@@ -386,9 +370,6 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		return ITER_ERROR;
 	return ok;
 }
 
@@ -401,23 +382,18 @@ static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return ref_iterator_peel(iter->iter0, peeled);
 }
 
-static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
 		(struct prefix_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
+	ref_iterator_free(iter->iter0);
 	free(iter->prefix);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
 	.peel = prefix_ref_iterator_peel,
-	.abort = prefix_ref_iterator_abort,
+	.release = prefix_ref_iterator_release,
 };
 
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
@@ -453,20 +429,14 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 	current_ref_iter = iter;
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
-		if (retval) {
-			/*
-			 * If ref_iterator_abort() returns ITER_ERROR,
-			 * we ignore that error in deference to the
-			 * callback function's return value.
-			 */
-			ref_iterator_abort(iter);
+		if (retval)
 			goto out;
-		}
 	}
 
 out:
 	current_ref_iter = old_ref_iter;
 	if (ok == ITER_ERROR)
-		return -1;
+		retval = -1;
+	ref_iterator_free(iter);
 	return retval;
 }
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7b6f74b6e3..38a1956d1a8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -954,9 +954,6 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		return ITER_OK;
 	}
 
-	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-		ok = ITER_ERROR;
-
 	return ok;
 }
 
@@ -976,23 +973,19 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	}
 }
 
-static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
-	int ok = ITER_DONE;
-
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
 	release_snapshot(iter->snapshot);
-	base_ref_iterator_free(ref_iterator);
-	return ok;
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
 	.peel = packed_ref_iterator_peel,
-	.abort = packed_ref_iterator_abort
+	.release = packed_ref_iterator_release,
 };
 
 static int jump_list_entry_cmp(const void *va, const void *vb)
@@ -1362,8 +1355,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 	 */
 	iter = packed_ref_iterator_begin(&refs->base, "", NULL,
 					 DO_FOR_EACH_INCLUDE_BROKEN);
-	if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+	if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+		ref_iterator_free(iter);
 		iter = NULL;
+	}
 
 	i = 0;
 
@@ -1411,8 +1406,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 				 * the iterator over the unneeded
 				 * value.
 				 */
-				if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+				if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+					ref_iterator_free(iter);
 					iter = NULL;
+				}
 				cmp = +1;
 			} else {
 				/*
@@ -1449,8 +1446,10 @@ static int write_with_updates(struct packed_ref_store *refs,
 					       peel_error ? NULL : &peeled))
 				goto write_error;
 
-			if ((ok = ref_iterator_advance(iter)) != ITER_OK)
+			if ((ok = ref_iterator_advance(iter)) != ITER_OK) {
+				ref_iterator_free(iter);
 				iter = NULL;
+			}
 		} else if (is_null_oid(&update->new_oid)) {
 			/*
 			 * The update wants to delete the reference,
@@ -1499,9 +1498,7 @@ static int write_with_updates(struct packed_ref_store *refs,
 		    get_tempfile_path(refs->tempfile), strerror(errno));
 
 error:
-	if (iter)
-		ref_iterator_abort(iter);
-
+	ref_iterator_free(iter);
 	delete_tempfile(&refs->tempfile);
 	return -1;
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 02f09e4df88..6457e02c1ea 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -409,7 +409,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		if (++level->index == level->dir->nr) {
 			/* This level is exhausted; pop up a level */
 			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
+				return ITER_DONE;
 
 			continue;
 		}
@@ -452,21 +452,18 @@ static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return peel_object(iter->repo, ref_iterator->oid, peeled) ? -1 : 0;
 }
 
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-
 	free((char *)iter->prefix);
 	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
 	.peel = cache_ref_iterator_peel,
-	.abort = cache_ref_iterator_abort
+	.release = cache_ref_iterator_release,
 };
 
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index aaab711bb96..74e2c03cef1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -273,11 +273,11 @@ enum do_for_each_ref_flags {
  * the next reference and returns ITER_OK. The data pointed at by
  * refname and oid belong to the iterator; if you want to retain them
  * after calling ref_iterator_advance() again or calling
- * ref_iterator_abort(), you must make a copy. When the iteration has
+ * ref_iterator_free(), you must make a copy. When the iteration has
  * been exhausted, ref_iterator_advance() releases any resources
  * associated with the iteration, frees the ref_iterator object, and
  * returns ITER_DONE. If you want to abort the iteration early, call
- * ref_iterator_abort(), which also frees the ref_iterator object and
+ * ref_iterator_free(), which also frees the ref_iterator object and
  * any associated resources. If there was an internal error advancing
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
@@ -293,7 +293,7 @@ enum do_for_each_ref_flags {
  *
  *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
- *                     ok = ref_iterator_abort(iter);
+ *                     ok = ITER_DONE;
  *                     break;
  *             }
  *
@@ -307,6 +307,7 @@ enum do_for_each_ref_flags {
  *
  *     if (ok != ITER_DONE)
  *             handle_error();
+ *     ref_iterator_free(iter);
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
@@ -333,12 +334,8 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator);
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled);
 
-/*
- * End the iteration before it has been exhausted, freeing the
- * reference iterator and any associated resources and returning
- * ITER_DONE. If the abort itself failed, return ITER_ERROR.
- */
-int ref_iterator_abort(struct ref_iterator *ref_iterator);
+/* Free the reference iterator and any associated resources. */
+void ref_iterator_free(struct ref_iterator *ref_iterator);
 
 /*
  * An iterator over nothing (its first ref_iterator_advance() call
@@ -438,13 +435,6 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 void base_ref_iterator_init(struct ref_iterator *iter,
 			    struct ref_iterator_vtable *vtable);
 
-/*
- * Base class destructor for ref_iterators. Destroy the ref_iterator
- * part of iter and shallow-free the object. This is meant to be
- * called only by the destructors of derived classes.
- */
-void base_ref_iterator_free(struct ref_iterator *iter);
-
 /* Virtual function declarations for ref_iterators: */
 
 /*
@@ -463,15 +453,14 @@ typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 
 /*
  * Implementations of this function should free any resources specific
- * to the derived class, then call base_ref_iterator_free() to clean
- * up and free the ref_iterator object.
+ * to the derived class.
  */
-typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
 	ref_iterator_peel_fn *peel;
-	ref_iterator_abort_fn *abort;
+	ref_iterator_release_fn *release;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 5c464b9d143..57d8512fe80 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -711,17 +711,10 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -740,7 +733,7 @@ static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	return -1;
 }
 
-static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_ref_iterator *iter =
 		(struct reftable_ref_iterator *)ref_iterator;
@@ -751,14 +744,12 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
 	.peel = reftable_ref_iterator_peel,
-	.abort = reftable_ref_iterator_abort
+	.release = reftable_ref_iterator_release,
 };
 
 static int qsort_strcmp(const void *va, const void *vb)
@@ -2016,17 +2007,10 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 		break;
 	}
 
-	if (iter->err > 0) {
-		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
-			return ITER_ERROR;
+	if (iter->err > 0)
 		return ITER_DONE;
-	}
-
-	if (iter->err < 0) {
-		ref_iterator_abort(ref_iterator);
+	if (iter->err < 0)
 		return ITER_ERROR;
-	}
-
 	return ITER_OK;
 }
 
@@ -2037,21 +2021,19 @@ static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSE
 	return -1;
 }
 
-static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct reftable_reflog_iterator *iter =
 		(struct reftable_reflog_iterator *)ref_iterator;
 	reftable_log_record_release(&iter->log);
 	reftable_iterator_destroy(&iter->iter);
 	strbuf_release(&iter->last_name);
-	free(iter);
-	return ITER_DONE;
 }
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
 	.peel = reftable_reflog_iterator_peel,
-	.abort = reftable_reflog_iterator_abort
+	.release = reftable_reflog_iterator_release,
 };
 
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 6b297bd7536..8d46e8ba409 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -53,6 +53,7 @@ int cmd__dir_iterator(int argc, const char **argv)
 		printf("(%s) [%s] %s\n", diter->relative_path, diter->basename,
 		       diter->path.buf);
 	}
+	dir_iterator_free(diter);
 
 	if (iter_status != ITER_DONE) {
 		printf("dir_iterator_advance failure\n");

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 10/16] refs/iterator: provide infrastructure to re-seek iterators
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.

Introduce a new `.seek` function in the ref iterator vtable that allows
callers to seek an iterator multiple times. It is expected to be
functionally the same as calling `refs_ref_iterator_begin()` with a
different (or the same) prefix.

Note that it is not possible to adjust parameters other than the seeked
prefix for now, so exclude patterns, trimmed prefixes and flags will
remain unchanged. We do not have a usecase for changing these parameters
right now, but if we ever find one we can adapt accordingly.

Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/debug.c         | 11 +++++++++++
 refs/iterator.c      | 24 ++++++++++++++++++++++++
 refs/refs-internal.h | 24 ++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/refs/debug.c b/refs/debug.c
index a9786da4ba1..5390fa9c187 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -169,6 +169,16 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return res;
 }
 
+static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->seek(diter->iter, prefix);
+	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
+	return res;
+}
+
 static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -189,6 +199,7 @@ static void debug_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable debug_ref_iterator_vtable = {
 	.advance = debug_ref_iterator_advance,
+	.seek = debug_ref_iterator_seek,
 	.peel = debug_ref_iterator_peel,
 	.release = debug_ref_iterator_release,
 };
diff --git a/refs/iterator.c b/refs/iterator.c
index d61474cba75..ea4db59481d 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -15,6 +15,12 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ref_iterator->vtable->advance(ref_iterator);
 }
 
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix)
+{
+	return ref_iterator->vtable->seek(ref_iterator, prefix);
+}
+
 int ref_iterator_peel(struct ref_iterator *ref_iterator,
 		      struct object_id *peeled)
 {
@@ -50,6 +56,12 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED)
 	return ITER_DONE;
 }
 
+static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				   const char *prefix UNUSED)
+{
+	return 0;
+}
+
 static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				   struct object_id *peeled UNUSED)
 {
@@ -62,6 +74,7 @@ static void empty_ref_iterator_release(struct ref_iterator *ref_iterator UNUSED)
 
 static struct ref_iterator_vtable empty_ref_iterator_vtable = {
 	.advance = empty_ref_iterator_advance,
+	.seek = empty_ref_iterator_seek,
 	.peel = empty_ref_iterator_peel,
 	.release = empty_ref_iterator_release,
 };
@@ -373,6 +386,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				    struct object_id *peeled)
 {
@@ -392,6 +415,7 @@ static void prefix_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
 	.advance = prefix_ref_iterator_advance,
+	.seek = prefix_ref_iterator_seek,
 	.peel = prefix_ref_iterator_peel,
 	.release = prefix_ref_iterator_release,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 74e2c03cef1..8f18274a165 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -327,6 +327,22 @@ struct ref_iterator {
  */
 int ref_iterator_advance(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference with the given prefix.
+ * The prefix is matched as a literal string, without regard for path
+ * separators. If prefix is NULL or the empty string, seek the iterator to the
+ * first reference again.
+ *
+ * This function is expected to behave as if a new ref iterator with the same
+ * prefix had been created, but allows reuse of iterators and thus may allow
+ * the backend to optimize. Parameters other than the prefix that have been
+ * passed when creating the iterator will remain unchanged.
+ *
+ * Returns 0 on success, a negative error code otherwise.
+ */
+int ref_iterator_seek(struct ref_iterator *ref_iterator,
+		      const char *prefix);
+
 /*
  * If possible, peel the reference currently being viewed by the
  * iterator. Return 0 on success.
@@ -445,6 +461,13 @@ void base_ref_iterator_init(struct ref_iterator *iter,
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
+/*
+ * Seek the iterator to the first reference matching the given prefix. Should
+ * behave the same as if a new iterator was created with the same prefix.
+ */
+typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
+				 const char *prefix);
+
 /*
  * Peels the current ref, returning 0 for success or -1 for failure.
  */
@@ -459,6 +482,7 @@ typedef void ref_iterator_release_fn(struct ref_iterator *ref_iterator);
 
 struct ref_iterator_vtable {
 	ref_iterator_advance_fn *advance;
+	ref_iterator_seek_fn *seek;
 	ref_iterator_peel_fn *peel;
 	ref_iterator_release_fn *release;
 };

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 11/16] refs/iterator: implement seeking for merged iterators
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/iterator.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index ea4db59481d..766d96e795c 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -96,7 +96,8 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
 struct merge_ref_iterator {
 	struct ref_iterator base;
 
-	struct ref_iterator *iter0, *iter1;
+	struct ref_iterator *iter0, *iter0_owned;
+	struct ref_iterator *iter1, *iter1_owned;
 
 	ref_iterator_select_fn *select;
 	void *cb_data;
@@ -160,13 +161,11 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	if (!iter->current) {
 		/* Initialize: advance both iterators to their first entries */
 		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
-			ref_iterator_free(iter->iter0);
 			iter->iter0 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
 		}
 		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
-			ref_iterator_free(iter->iter1);
 			iter->iter1 = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -177,7 +176,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 * entry:
 		 */
 		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
-			ref_iterator_free(*iter->current);
 			*iter->current = NULL;
 			if (ok == ITER_ERROR)
 				goto error;
@@ -206,7 +204,6 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		if (selection & ITER_SKIP_SECONDARY) {
 			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
-				ref_iterator_free(*secondary);
 				*secondary = NULL;
 				if (ok == ITER_ERROR)
 					goto error;
@@ -226,6 +223,28 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_ERROR;
 }
 
+static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ret;
+
+	iter->current = NULL;
+	iter->iter0 = iter->iter0_owned;
+	iter->iter1 = iter->iter1_owned;
+
+	ret = ref_iterator_seek(iter->iter0, prefix);
+	if (ret < 0)
+		return ret;
+
+	ret = ref_iterator_seek(iter->iter1, prefix);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -242,12 +261,13 @@ static void merge_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct merge_ref_iterator *iter =
 		(struct merge_ref_iterator *)ref_iterator;
-	ref_iterator_free(iter->iter0);
-	ref_iterator_free(iter->iter1);
+	ref_iterator_free(iter->iter0_owned);
+	ref_iterator_free(iter->iter1_owned);
 }
 
 static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 	.advance = merge_ref_iterator_advance,
+	.seek = merge_ref_iterator_seek,
 	.peel = merge_ref_iterator_peel,
 	.release = merge_ref_iterator_release,
 };
@@ -268,8 +288,8 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 */
 
 	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
-	iter->iter0 = iter0;
-	iter->iter1 = iter1;
+	iter->iter0 = iter->iter0_owned = iter0;
+	iter->iter1 = iter->iter1_owned = iter1;
 	iter->select = select;
 	iter->cb_data = cb_data;
 	iter->current = NULL;

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 12/16] refs/iterator: implement seeking for reftable iterators
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:

  - We do not support seeking on reflog iterators. It is unclear what
    seeking would even look like in this context, as you typically would
    want to seek to a specific entry in the reflog for a specific ref.
    There is currently no use case for this, but if one arises in the
    future, we can still implement seeking at that later point.

  - We start to check whether `reftable_stack_init_ref_iterator()` is
    successful.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 57d8512fe80..6a60b26d1b9 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -547,7 +547,7 @@ struct reftable_ref_iterator {
 	struct reftable_ref_record ref;
 	struct object_id oid;
 
-	const char *prefix;
+	char *prefix;
 	size_t prefix_len;
 	char **exclude_patterns;
 	size_t exclude_patterns_index;
@@ -718,6 +718,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				      const char *prefix)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->prefix_len = prefix ? strlen(prefix) : 0;
+	iter->err = reftable_iterator_seek_ref(&iter->iter, prefix);
+
+	return iter->err;
+}
+
 static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				      struct object_id *peeled)
 {
@@ -744,10 +758,12 @@ static void reftable_ref_iterator_release(struct ref_iterator *ref_iterator)
 			free(iter->exclude_patterns[i]);
 		free(iter->exclude_patterns);
 	}
+	free(iter->prefix);
 }
 
 static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
 	.advance = reftable_ref_iterator_advance,
+	.seek = reftable_ref_iterator_seek,
 	.peel = reftable_ref_iterator_peel,
 	.release = reftable_ref_iterator_release,
 };
@@ -806,8 +822,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 
 	iter = xcalloc(1, sizeof(*iter));
 	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable);
-	iter->prefix = prefix;
-	iter->prefix_len = prefix ? strlen(prefix) : 0;
 	iter->base.oid = &iter->oid;
 	iter->flags = flags;
 	iter->refs = refs;
@@ -821,8 +835,11 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	reftable_stack_init_ref_iterator(stack, &iter->iter);
-	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
+	ret = reftable_stack_init_ref_iterator(stack, &iter->iter);
+	if (ret)
+		goto done;
+
+	ret = reftable_ref_iterator_seek(&iter->base, prefix);
 	if (ret)
 		goto done;
 
@@ -2014,6 +2031,13 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ITER_OK;
 }
 
+static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+					 const char *prefix UNUSED)
+{
+	BUG("reftable reflog iterator cannot be seeked");
+	return -1;
+}
+
 static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 					 struct object_id *peeled UNUSED)
 {
@@ -2032,6 +2056,7 @@ static void reftable_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 	.advance = reftable_reflog_iterator_advance,
+	.seek = reftable_reflog_iterator_seek,
 	.peel = reftable_reflog_iterator_peel,
 	.release = reftable_reflog_iterator_release,
 };

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 13/16] refs/iterator: implement seeking for ref-cache iterators
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.

Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance concern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/ref-cache.c | 79 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 6457e02c1ea..c1f1bab1d50 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -362,9 +362,7 @@ struct cache_ref_iterator {
 	struct ref_iterator base;
 
 	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
+	 * The number of levels currently on the stack.
 	 */
 	size_t levels_nr;
 
@@ -376,7 +374,7 @@ struct cache_ref_iterator {
 	 * The prefix is matched textually, without regard for path
 	 * component boundaries.
 	 */
-	const char *prefix;
+	char *prefix;
 
 	/*
 	 * A stack of levels. levels[0] is the uppermost level that is
@@ -389,6 +387,9 @@ struct cache_ref_iterator {
 	struct cache_ref_iterator_level *levels;
 
 	struct repository *repo;
+	struct ref_cache *cache;
+
+	int prime_dir;
 };
 
 static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
@@ -396,6 +397,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
 
+	if (!iter->levels_nr)
+		return ITER_DONE;
+
 	while (1) {
 		struct cache_ref_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
@@ -444,6 +448,41 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
+static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct cache_ref_iterator_level *level;
+	struct ref_dir *dir;
+
+	dir = get_ref_dir(iter->cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix);
+	if (!dir) {
+		iter->levels_nr = 0;
+		return 0;
+	}
+
+	if (iter->prime_dir)
+		prime_ref_dir(dir, prefix);
+	iter->levels_nr = 1;
+	level = &iter->levels[0];
+	level->index = -1;
+	level->dir = dir;
+
+	if (prefix && *prefix) {
+		free(iter->prefix);
+		iter->prefix = xstrdup(prefix);
+		level->prefix_state = PREFIX_WITHIN_DIR;
+	} else {
+		FREE_AND_NULL(iter->prefix);
+		level->prefix_state = PREFIX_CONTAINS_DIR;
+	}
+
+	return 0;
+}
+
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -456,12 +495,13 @@ static void cache_ref_iterator_release(struct ref_iterator *ref_iterator)
 {
 	struct cache_ref_iterator *iter =
 		(struct cache_ref_iterator *)ref_iterator;
-	free((char *)iter->prefix);
+	free(iter->prefix);
 	free(iter->levels);
 }
 
 static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	.advance = cache_ref_iterator_advance,
+	.seek = cache_ref_iterator_seek,
 	.peel = cache_ref_iterator_peel,
 	.release = cache_ref_iterator_release,
 };
@@ -471,39 +511,22 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      struct repository *repo,
 					      int prime_dir)
 {
-	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	dir = get_ref_dir(cache->root);
-	if (prefix && *prefix)
-		dir = find_containing_dir(dir, prefix);
-	if (!dir)
-		/* There's nothing to iterate over. */
-		return empty_ref_iterator_begin();
-
-	if (prime_dir)
-		prime_ref_dir(dir, prefix);
 
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
+	iter->repo = repo;
+	iter->cache = cache;
+	iter->prime_dir = prime_dir;
 
-	if (prefix && *prefix) {
-		iter->prefix = xstrdup(prefix);
-		level->prefix_state = PREFIX_WITHIN_DIR;
-	} else {
-		level->prefix_state = PREFIX_CONTAINS_DIR;
+	if (cache_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
 	}
 
-	iter->repo = repo;
-
 	return ref_iterator;
 }

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 14/16] refs/iterator: implement seeking for packed-ref iterators
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (12 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/packed-backend.c | 65 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 38a1956d1a8..f4c82ba2c7d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -819,6 +819,8 @@ struct packed_ref_iterator {
 
 	struct snapshot *snapshot;
 
+	char *prefix;
+
 	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
@@ -841,11 +843,9 @@ struct packed_ref_iterator {
 };
 
 /*
- * Move the iterator to the next record in the snapshot, without
- * respect for whether the record is actually required by the current
- * iteration. Adjust the fields in `iter` and return `ITER_OK` or
- * `ITER_DONE`. This function does not free the iterator in the case
- * of `ITER_DONE`.
+ * Move the iterator to the next record in the snapshot. Adjust the fields in
+ * `iter` and return `ITER_OK` or `ITER_DONE`. This function does not free the
+ * iterator in the case of `ITER_DONE`.
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
@@ -942,6 +942,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = next_record(iter)) == ITER_OK) {
+		const char *refname = iter->base.refname;
+		const char *prefix = iter->prefix;
+
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
 		    !is_per_worktree_ref(iter->base.refname))
 			continue;
@@ -951,12 +954,41 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 					    &iter->oid, iter->flags))
 			continue;
 
+		while (prefix && *prefix) {
+			if (*refname < *prefix)
+				BUG("packed-refs backend yielded reference preceding its prefix");
+			else if (*refname > *prefix)
+				return ITER_DONE;
+			prefix++;
+			refname++;
+		}
+
 		return ITER_OK;
 	}
 
 	return ok;
 }
 
+static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				    const char *prefix)
+{
+	struct packed_ref_iterator *iter =
+		(struct packed_ref_iterator *)ref_iterator;
+	const char *start;
+
+	if (prefix && *prefix)
+		start = find_reference_location(iter->snapshot, prefix, 0);
+	else
+		start = iter->snapshot->start;
+
+	free(iter->prefix);
+	iter->prefix = xstrdup_or_null(prefix);
+	iter->pos = start;
+	iter->eof = iter->snapshot->eof;
+
+	return 0;
+}
+
 static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -979,11 +1011,13 @@ static void packed_ref_iterator_release(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	strbuf_release(&iter->refname_buf);
 	free(iter->jump);
+	free(iter->prefix);
 	release_snapshot(iter->snapshot);
 }
 
 static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.advance = packed_ref_iterator_advance,
+	.seek = packed_ref_iterator_seek,
 	.peel = packed_ref_iterator_peel,
 	.release = packed_ref_iterator_release,
 };
@@ -1097,7 +1131,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 {
 	struct packed_ref_store *refs;
 	struct snapshot *snapshot;
-	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -1113,14 +1146,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
-	if (start == snapshot->eof)
-		return empty_ref_iterator_begin();
-
 	CALLOC_ARRAY(iter, 1);
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
@@ -1130,19 +1155,15 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
-
-	iter->pos = start;
-	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-
 	iter->base.oid = &iter->oid;
-
 	iter->repo = ref_store->repo;
 	iter->flags = flags;
 
-	if (prefix && *prefix)
-		/* Stop iteration after we've gone *past* prefix: */
-		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+	if (packed_ref_iterator_seek(&iter->base, prefix) < 0) {
+		ref_iterator_free(&iter->base);
+		return NULL;
+	}
 
 	return ref_iterator;
 }

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 15/16] refs/iterator: implement seeking for files iterators
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (13 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-12 15:56   ` [PATCH v6 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
  2025-03-13  2:57   ` [PATCH v6 00/16] refs: batch refname availability checks shejialuo
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/files-backend.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index e97a267ad65..5f921e85eb4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -918,6 +918,14 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
+				   const char *prefix)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	return ref_iterator_seek(iter->iter0, prefix);
+}
+
 static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
@@ -936,6 +944,7 @@ static void files_ref_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_ref_iterator_vtable = {
 	.advance = files_ref_iterator_advance,
+	.seek = files_ref_iterator_seek,
 	.peel = files_ref_iterator_peel,
 	.release = files_ref_iterator_release,
 };
@@ -2294,6 +2303,12 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 	return ok;
 }
 
+static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
+				      const char *prefix UNUSED)
+{
+	BUG("ref_iterator_seek() called for reflog_iterator");
+}
+
 static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator UNUSED,
 				      struct object_id *peeled UNUSED)
 {
@@ -2309,6 +2324,7 @@ static void files_reflog_iterator_release(struct ref_iterator *ref_iterator)
 
 static struct ref_iterator_vtable files_reflog_iterator_vtable = {
 	.advance = files_reflog_iterator_advance,
+	.seek = files_reflog_iterator_seek,
 	.peel = files_reflog_iterator_peel,
 	.release = files_reflog_iterator_release,
 };

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v6 16/16] refs: reuse iterators when determining refname availability
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (14 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
@ 2025-03-12 15:56   ` Patrick Steinhardt
  2025-03-13  2:57   ` [PATCH v6 00/16] refs: batch refname availability checks shejialuo
  16 siblings, 0 replies; 163+ messages in thread
From: Patrick Steinhardt @ 2025-03-12 15:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	shejialuo, Christian Couder

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

The "files" backend doesn't really show a huge impact:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
      Range (min … max):   384.6 ms … 404.5 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
      Range (min … max):   377.0 ms … 397.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/refs.c b/refs.c
index 3e65ccad7ac..5f0dbee3aff 100644
--- a/refs.c
+++ b/refs.c
@@ -2556,8 +2556,13 @@ int refs_verify_refnames_available(struct ref_store *refs,
 		if (!initial_transaction) {
 			int ok;
 
-			iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
-						       DO_FOR_EACH_INCLUDE_BROKEN);
+			if (!iter) {
+				iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
+							       DO_FOR_EACH_INCLUDE_BROKEN);
+			} else if (ref_iterator_seek(iter, dirname.buf) < 0) {
+				goto cleanup;
+			}
+
 			while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 				if (skip &&
 				    string_list_has_string(skip, iter->refname))
@@ -2570,9 +2575,6 @@ int refs_verify_refnames_available(struct ref_store *refs,
 
 			if (ok != ITER_DONE)
 				BUG("error while iterating over references");
-
-			ref_iterator_free(iter);
-			iter = NULL;
 		}
 
 		extra_refname = find_descendant_ref(dirname.buf, extras, skip);

-- 
2.49.0.rc2.394.gf6994c5077.dirty



^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v6 00/16] refs: batch refname availability checks
  2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
                     ` (15 preceding siblings ...)
  2025-03-12 15:56   ` [PATCH v6 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
@ 2025-03-13  2:57   ` shejialuo
  16 siblings, 0 replies; 163+ messages in thread
From: shejialuo @ 2025-03-13  2:57 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Karthik Nayak, brian m. carlson, Jeff King, Junio C Hamano,
	Christian Couder

On Wed, Mar 12, 2025 at 04:56:06PM +0100, Patrick Steinhardt wrote:
> Changes in v6:
>   - Use `for_each_string_list()` instead of manually iterating through
>     the string list.
>   - Stop sorting refs passed to `refs_verify_refnames_available()`.
>   - Revive a comment that has been deleted during one of the
>     refactorings.
>   - Link to v5: https://lore.kernel.org/r/20250306-pks-update-ref-optimization-v5-0-dcb2ee037e97@pks.im
> 
> Thanks!
> 
> Patrick

The range-diff looks good to me. Thanks for your efforts.

Thanks,
Jialuo


^ permalink raw reply	[flat|nested] 163+ messages in thread

end of thread, other threads:[~2025-03-13  2:57 UTC | newest]

Thread overview: 163+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 01/14] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 02/14] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-18 16:04   ` Karthik Nayak
2025-02-17 15:50 ` [PATCH 04/14] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 05/14] refs/reftable: start using `refs_verify_refnames_available()` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 06/14] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-18 16:12   ` Karthik Nayak
2025-02-19 11:52     ` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 07/14] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-18 16:52   ` shejialuo
2025-02-19 11:52     ` Patrick Steinhardt
2025-02-19 12:41       ` shejialuo
2025-02-19 12:59         ` Patrick Steinhardt
2025-02-19 13:06           ` shejialuo
2025-02-19 13:17             ` Patrick Steinhardt
2025-02-19 13:20               ` Patrick Steinhardt
2025-02-19 13:23                 ` shejialuo
2025-02-18 17:13   ` Karthik Nayak
2025-02-19 11:52     ` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 08/14] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 09/14] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-19 20:10   ` Karthik Nayak
2025-02-17 15:50 ` [PATCH 10/14] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-02-19 20:13   ` Karthik Nayak
2025-02-17 15:50 ` [PATCH 11/14] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 12/14] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 13/14] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 14/14] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-02-18 17:10 ` [PATCH 00/14] refs: batch refname availability checks brian m. carlson
2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-19 17:02     ` Justin Tobler
2025-02-19 13:23   ` [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-02-21  8:00     ` Jeff King
2025-02-21  8:36       ` Patrick Steinhardt
2025-02-21  9:06         ` Jeff King
2025-02-19 13:23   ` [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-19 18:21     ` Justin Tobler
2025-02-20  8:05       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 05/16] refs/reftable: " Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-24 13:08     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-24 13:37     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-02-24 14:00     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-24 14:49     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
2025-02-24 15:09     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-25 12:07         ` shejialuo
2025-02-19 13:23   ` [PATCH v2 15/16] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-02-24 15:14     ` shejialuo
2025-02-24 15:18   ` [PATCH v2 00/16] refs: batch refname availability checks shejialuo
2025-02-25  7:39     ` Patrick Steinhardt
2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-26 22:26     ` Junio C Hamano
2025-02-27 11:57       ` Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 05/16] refs/reftable: " Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-25  8:56   ` [PATCH v3 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-02-25  8:56   ` [PATCH v3 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-02-25  8:56   ` [PATCH v3 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-03-06 13:21     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-03-06 13:47     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 05/16] refs/reftable: " Patrick Steinhardt
2025-03-06 14:00     ` Karthik Nayak
2025-03-06 14:12       ` Karthik Nayak
2025-03-06 15:13         ` Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-03-06 14:10     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-03-06 14:16     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-03-06 14:20   ` [PATCH v4 00/16] refs: batch refname availability checks Karthik Nayak
2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-03-12 12:12     ` shejialuo
2025-03-06 15:08   ` [PATCH v5 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-03-12 12:36     ` shejialuo
2025-03-12 12:44       ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 05/16] refs/reftable: " Patrick Steinhardt
2025-03-12 12:54     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-03-12 12:58     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-03-12 13:06     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-03-12 13:22     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-03-12 13:45     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-03-06 15:32   ` [PATCH v5 00/16] refs: batch refname availability checks Karthik Nayak
2025-03-12 14:03   ` shejialuo
2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 05/16] refs/reftable: " Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-03-13  2:57   ` [PATCH v6 00/16] refs: batch refname availability checks shejialuo

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).