[PATCH 0/6] fetch: improve atomicity of `--atomic` flag

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/6] fetch: improve atomicity of `--atomic` flag
@ 2022-02-11  7:46 Patrick Steinhardt
  2022-02-11  7:46 ` [PATCH 1/6] fetch: increase test coverage of fetches Patrick Steinhardt
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:46 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

Hi,

in c7b190dabd (fetch: implement support for atomic reference updates,
2021-01-12), I have added a new `--atomic` flag to git-fetch(1) which is
supposed to make it an all-or-nothing operation: either all refs are
updated, or none is. I have recently discovered though that there were
two oversights: neither pruning of refs via `--prune` nor the tag
backfill mechanism are currently covered by this flag, which breaks the
promise.

This patch series extends coverage of the `--atomic` flag to fix my
oversights. It applies on top of ps/avoid-unnecessary-hook-invocation-with-packed-refs
merged to v2.35.1 due to a semantic conflict in tests: I use the reftx
hook to verify some things work as expected, and above branch causes us
to not execute the hook for packed-refs anymore.

It is structured as follows:

    - Patch 1 adds multiple tests which demonstrate issues with
      backfilling and the lacking atomicity.

    - Patch 2 and 3 do some preliminary refactorings which set the stage
      for improved atomicity.

    - Patch 4 fixes a bug with backfilling tags: we don't return an
      error code to the user if the backfill fails.

    - Patch 5 and 6 then finally extend the atomicity guarantees.

I'm not yet all that happy with patch 5: it currently has to reach into
the `ref_transaction` struct to be able to filter out any updates which
have already been queued such that we don't accidentally queue the same
tag update twice. I first wanted to get some feedback on this patch
series though, and in case it is agreed that it fixes a real issue I may
introduce a new non-internal API which allows iterating over queued
updates.

Patrick

Patrick Steinhardt (6):
  fetch: increase test coverage of fetches
  fetch: backfill tags before setting upstream
  fetch: control lifecycle of FETCH_HEAD in a single place
  fetch: report errors when backfilling tags fails
  fetch: make `--atomic` flag cover backfilling of tags
  fetch: make `--atomic` flag cover pruning of refs

 builtin/fetch.c      | 190 +++++++++++++++++++++++++++++--------------
 t/t5503-tagfollow.sh |  78 ++++++++++++++++++
 t/t5510-fetch.sh     |  31 +++++++
 3 files changed, 236 insertions(+), 63 deletions(-)

-- 
2.35.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] fetch: increase test coverage of fetches
  2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
@ 2022-02-11  7:46 ` Patrick Steinhardt
  2022-02-15  6:19   ` Christian Couder
  2022-02-11  7:46 ` [PATCH 2/6] fetch: backfill tags before setting upstream Patrick Steinhardt
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:46 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5103 bytes --]

The `--atomic` flag is missing test coverage for pruning of deleted
references and backfilling of tags, and of course both aren't covered
correctly by this flag. Furthermore, we don't have tests demonstrating
error cases for backfilling tags.

Add tests to cover those testing gaps.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5503-tagfollow.sh | 85 ++++++++++++++++++++++++++++++++++++++++++++
 t/t5510-fetch.sh     | 33 +++++++++++++++++
 2 files changed, 118 insertions(+)

diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
index 195fc64dd4..888305ad4d 100755
--- a/t/t5503-tagfollow.sh
+++ b/t/t5503-tagfollow.sh
@@ -160,4 +160,89 @@ test_expect_success 'new clone fetch main and tags' '
 	test_cmp expect actual
 '
 
+test_expect_success 'atomic fetch with failing backfill' '
+	git init clone3 &&
+
+	# We want to test whether a failure when backfilling tags correctly
+	# aborts the complete transaction when `--atomic` is passed: we should
+	# neither create the branch nor should we create the tag when either
+	# one of both fails to update correctly.
+	#
+	# To trigger failure we simply abort when backfilling a tag.
+	write_script clone3/.git/hooks/reference-transaction <<-\EOF &&
+		#!/bin/sh
+
+		while read oldrev newrev reference
+		do
+			if test "$reference" = refs/tags/tag1
+			then
+				exit 1
+			fi
+		done
+	EOF
+
+	test_must_fail git -C clone3 fetch --atomic .. $B:refs/heads/something &&
+
+	# Creation of the tag has failed, so ideally refs/heads/something
+	# should not exist. The fact that it does is demonstrates that there is
+	# missing coverage in the `--atomic` flag.
+	test $B = "$(git -C clone3 rev-parse --verify refs/heads/something)"
+'
+
+test_expect_success 'atomic fetch with backfill should use single transaction' '
+	git init clone4 &&
+
+	# Fetching with the `--atomic` flag should update all references in a
+	# single transaction, including backfilled tags. We thus expect to see
+	# a single reference transaction for the created branch and tags.
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $B refs/heads/something
+		$ZERO_OID $S refs/tags/tag2
+		committed
+		$ZERO_OID $B refs/heads/something
+		$ZERO_OID $S refs/tags/tag2
+		prepared
+		$ZERO_OID $T refs/tags/tag1
+		committed
+		$ZERO_OID $T refs/tags/tag1
+	EOF
+
+	write_script clone4/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+	EOF
+
+	git -C clone4 fetch --atomic .. $B:refs/heads/something &&
+	test_cmp expected clone4/actual
+'
+
+test_expect_success 'backfill failure causes command to fail' '
+	git init clone5 &&
+
+	write_script clone5/.git/hooks/reference-transaction <<-EOF &&
+		#!/bin/sh
+
+		while read oldrev newrev reference
+		do
+			if test "\$reference" = refs/tags/tag1
+			then
+				# Create a nested tag below the actual tag we
+				# wanted to write, which causes a D/F conflict
+				# later when we want to commit refs/tags/tag1.
+				# We cannot just `exit 1` here given that this
+				# would cause us to die immediately.
+				git update-ref refs/tags/tag1/nested $B
+				exit \$!
+			fi
+		done
+	EOF
+
+	# Even though we fail to create refs/tags/tag1 the below command
+	# unexpectedly succeeds.
+	git -C clone5 fetch .. $B:refs/heads/something &&
+	test $B = $(git -C clone5 rev-parse --verify refs/heads/something) &&
+	test $S = $(git -C clone5 rev-parse --verify tag2) &&
+	test_must_fail git -C clone5 rev-parse --verify tag1
+'
+
 test_done
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 20f7110ec1..93a0db3c68 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -332,6 +332,39 @@ test_expect_success 'fetch --atomic --append appends to FETCH_HEAD' '
 	test_cmp expected atomic/.git/FETCH_HEAD
 '
 
+test_expect_success 'fetch --atomic --prune executes a single reference transaction only' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch scheduled-for-deletion &&
+	git clone . atomic &&
+	git branch -D scheduled-for-deletion &&
+	git branch new-branch &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	# Fetching with the `--atomic` flag should update all references in a
+	# single transaction. It is currently missing coverage of pruned
+	# references though, and as a result those may be committed to disk
+	# even if updating references fails later.
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
+		committed
+		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/new-branch
+		committed
+		$ZERO_OID $head_oid refs/remotes/origin/new-branch
+	EOF
+
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+	EOF
+
+	git -C atomic fetch --atomic --prune origin &&
+	test_cmp expected atomic/actual
+'
+
 test_expect_success '--refmap="" ignores configured refspec' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone "$D" remote-refs &&
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] fetch: backfill tags before setting upstream
  2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
  2022-02-11  7:46 ` [PATCH 1/6] fetch: increase test coverage of fetches Patrick Steinhardt
@ 2022-02-11  7:46 ` Patrick Steinhardt
  2022-02-15  6:43   ` Christian Couder
  2022-02-11  7:46 ` [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place Patrick Steinhardt
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:46 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3422 bytes --]

The fetch code flow is a bit hard to understand right now:

    1. We optionally prune all references which have vanished on the
       remote side.
    2. We fetch and update all other references locally.
    3. We update the upstream branch in the gitconfig.
    4. We backfill tags pointing into the history we have just fetched.

It is quite confusing that we fetch objects and update references in
both (2) and (4), which is further stressed by the point that we require
a `skip` label which jumps from (3) to (4) in case we fail to update the
gitconfig as expected.

Reorder the code to first update all local references, and only after we
have done so update the upstream branch information. This improves the
code flow and furthermore makes it easier to refactor the way we update
references together.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5b3b18a72f..9c7e4f12cd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1536,7 +1536,7 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map,
 static int do_fetch(struct transport *transport,
 		    struct refspec *rs)
 {
-	struct ref *ref_map;
+	struct ref *ref_map = NULL;
 	int autotags = (transport->remote->fetch_tags == 1);
 	int retcode = 0;
 	const struct ref *remote_refs;
@@ -1618,11 +1618,22 @@ static int do_fetch(struct transport *transport,
 		}
 	}
 	if (fetch_and_consume_refs(transport, ref_map, worktrees)) {
-		free_refs(ref_map);
 		retcode = 1;
 		goto cleanup;
 	}
 
+	/* if neither --no-tags nor --tags was specified, do automated tag
+	 * following ... */
+	if (tags == TAGS_DEFAULT && autotags) {
+		struct ref *tags_ref_map = NULL, **tail = &tags_ref_map;
+
+		find_non_local_tags(remote_refs, &tags_ref_map, &tail);
+		if (tags_ref_map)
+			backfill_tags(transport, tags_ref_map, worktrees);
+
+		free_refs(tags_ref_map);
+	}
+
 	if (set_upstream) {
 		struct branch *branch = branch_get("HEAD");
 		struct ref *rm;
@@ -1642,7 +1653,7 @@ static int do_fetch(struct transport *transport,
 			if (!rm->peer_ref) {
 				if (source_ref) {
 					warning(_("multiple branches detected, incompatible with --set-upstream"));
-					goto skip;
+					goto cleanup;
 				} else {
 					source_ref = rm;
 				}
@@ -1656,7 +1667,7 @@ static int do_fetch(struct transport *transport,
 				warning(_("could not set upstream of HEAD to '%s' from '%s' when "
 					  "it does not point to any branch."),
 					shortname, transport->remote->name);
-				goto skip;
+				goto cleanup;
 			}
 
 			if (!strcmp(source_ref->name, "HEAD") ||
@@ -1676,21 +1687,9 @@ static int do_fetch(struct transport *transport,
 				  "you need to specify exactly one branch with the --set-upstream option"));
 		}
 	}
-skip:
-	free_refs(ref_map);
-
-	/* if neither --no-tags nor --tags was specified, do automated tag
-	 * following ... */
-	if (tags == TAGS_DEFAULT && autotags) {
-		struct ref **tail = &ref_map;
-		ref_map = NULL;
-		find_non_local_tags(remote_refs, &ref_map, &tail);
-		if (ref_map)
-			backfill_tags(transport, ref_map, worktrees);
-		free_refs(ref_map);
-	}
 
 cleanup:
+	free_refs(ref_map);
 	free_worktrees(worktrees);
 	return retcode;
 }
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place
  2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
  2022-02-11  7:46 ` [PATCH 1/6] fetch: increase test coverage of fetches Patrick Steinhardt
  2022-02-11  7:46 ` [PATCH 2/6] fetch: backfill tags before setting upstream Patrick Steinhardt
@ 2022-02-11  7:46 ` Patrick Steinhardt
  2022-02-15  7:32   ` Christian Couder
  2022-02-11  7:46 ` [PATCH 4/6] fetch: report errors when backfilling tags fails Patrick Steinhardt
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:46 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5622 bytes --]

There are two different locations where we're appending to FETCH_HEAD:
first when storing updated references, and second when backfilling tags.
Both times we open the file, append to it and then commit it into place,
which is essentially duplicate work.

Improve the lifecycle of updating FETCH_HEAD by opening and committing
it once in `do_fetch()`, where we pass the structure down to code which
wants to append to it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9c7e4f12cd..627847e2f8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1084,9 +1084,8 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
 
 static int store_updated_refs(const char *raw_url, const char *remote_name,
 			      int connectivity_checked, struct ref *ref_map,
-			      struct worktree **worktrees)
+			      struct fetch_head *fetch_head, struct worktree **worktrees)
 {
-	struct fetch_head fetch_head;
 	int url_len, i, rc = 0;
 	struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
 	struct ref_transaction *transaction = NULL;
@@ -1096,10 +1095,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
 
-	rc = open_fetch_head(&fetch_head);
-	if (rc)
-		return -1;
-
 	if (raw_url)
 		url = transport_anonymize_url(raw_url);
 	else
@@ -1206,7 +1201,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				strbuf_addf(&note, "'%s' of ", what);
 			}
 
-			append_fetch_head(&fetch_head, &rm->old_oid,
+			append_fetch_head(fetch_head, &rm->old_oid,
 					  rm->fetch_head_status,
 					  note.buf, url, url_len);
 
@@ -1246,9 +1241,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (!rc)
-		commit_fetch_head(&fetch_head);
-
 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
@@ -1268,7 +1260,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	strbuf_release(&err);
 	ref_transaction_free(transaction);
 	free(url);
-	close_fetch_head(&fetch_head);
 	return rc;
 }
 
@@ -1309,6 +1300,7 @@ static int check_exist_and_connected(struct ref *ref_map)
 
 static int fetch_and_consume_refs(struct transport *transport,
 				  struct ref *ref_map,
+				  struct fetch_head *fetch_head,
 				  struct worktree **worktrees)
 {
 	int connectivity_checked = 1;
@@ -1331,7 +1323,7 @@ static int fetch_and_consume_refs(struct transport *transport,
 
 	trace2_region_enter("fetch", "consume_refs", the_repository);
 	ret = store_updated_refs(transport->url, transport->remote->name,
-				 connectivity_checked, ref_map, worktrees);
+				 connectivity_checked, ref_map, fetch_head, worktrees);
 	trace2_region_leave("fetch", "consume_refs", the_repository);
 
 out:
@@ -1503,7 +1495,9 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 	return transport;
 }
 
-static void backfill_tags(struct transport *transport, struct ref *ref_map,
+static void backfill_tags(struct transport *transport,
+			  struct ref *ref_map,
+			  struct fetch_head *fetch_head,
 			  struct worktree **worktrees)
 {
 	int cannot_reuse;
@@ -1525,7 +1519,7 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map,
 	transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 	transport_set_option(transport, TRANS_OPT_DEPTH, "0");
 	transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-	fetch_and_consume_refs(transport, ref_map, worktrees);
+	fetch_and_consume_refs(transport, ref_map, fetch_head, worktrees);
 
 	if (gsecondary) {
 		transport_disconnect(gsecondary);
@@ -1544,6 +1538,7 @@ static int do_fetch(struct transport *transport,
 		TRANSPORT_LS_REFS_OPTIONS_INIT;
 	int must_list_refs = 1;
 	struct worktree **worktrees = get_worktrees();
+	struct fetch_head fetch_head = { 0 };
 
 	if (tags == TAGS_DEFAULT) {
 		if (transport->remote->fetch_tags == 2)
@@ -1601,6 +1596,10 @@ static int do_fetch(struct transport *transport,
 	if (!update_head_ok)
 		check_not_current_branch(ref_map, worktrees);
 
+	retcode = open_fetch_head(&fetch_head);
+	if (retcode)
+		goto cleanup;
+
 	if (tags == TAGS_DEFAULT && autotags)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
 	if (prune) {
@@ -1617,7 +1616,8 @@ static int do_fetch(struct transport *transport,
 				   transport->url);
 		}
 	}
-	if (fetch_and_consume_refs(transport, ref_map, worktrees)) {
+
+	if (fetch_and_consume_refs(transport, ref_map, &fetch_head, worktrees)) {
 		retcode = 1;
 		goto cleanup;
 	}
@@ -1629,11 +1629,13 @@ static int do_fetch(struct transport *transport,
 
 		find_non_local_tags(remote_refs, &tags_ref_map, &tail);
 		if (tags_ref_map)
-			backfill_tags(transport, tags_ref_map, worktrees);
+			backfill_tags(transport, tags_ref_map, &fetch_head, worktrees);
 
 		free_refs(tags_ref_map);
 	}
 
+	commit_fetch_head(&fetch_head);
+
 	if (set_upstream) {
 		struct branch *branch = branch_get("HEAD");
 		struct ref *rm;
@@ -1689,6 +1691,7 @@ static int do_fetch(struct transport *transport,
 	}
 
 cleanup:
+	close_fetch_head(&fetch_head);
 	free_refs(ref_map);
 	free_worktrees(worktrees);
 	return retcode;
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6] fetch: report errors when backfilling tags fails
  2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2022-02-11  7:46 ` [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place Patrick Steinhardt
@ 2022-02-11  7:46 ` Patrick Steinhardt
  2022-02-15  7:52   ` Christian Couder
                     ` (2 more replies)
  2022-02-11  7:47 ` [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags Patrick Steinhardt
  2022-02-11  7:47 ` [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs Patrick Steinhardt
  5 siblings, 3 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:46 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]

When the backfilling of tags fails we do not report this error to the
caller, but only report it implicitly at a later point when reporting
updated references. This leaves callers unable to act upon the
information of whether the backfilling succeeded or not.

Refactor the function to return an error code and pass it up the
callstack. This causes us to correctly propagate the error back to the
user of git-fetch(1).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c      | 29 +++++++++++++++++++++--------
 t/t5503-tagfollow.sh |  4 +---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 627847e2f8..1eda0b68ff 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1495,12 +1495,12 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 	return transport;
 }
 
-static void backfill_tags(struct transport *transport,
-			  struct ref *ref_map,
-			  struct fetch_head *fetch_head,
-			  struct worktree **worktrees)
+static int backfill_tags(struct transport *transport,
+			 struct ref *ref_map,
+			 struct fetch_head *fetch_head,
+			 struct worktree **worktrees)
 {
-	int cannot_reuse;
+	int retcode, cannot_reuse;
 
 	/*
 	 * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it
@@ -1519,12 +1519,14 @@ static void backfill_tags(struct transport *transport,
 	transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 	transport_set_option(transport, TRANS_OPT_DEPTH, "0");
 	transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-	fetch_and_consume_refs(transport, ref_map, fetch_head, worktrees);
+	retcode = fetch_and_consume_refs(transport, ref_map, fetch_head, worktrees);
 
 	if (gsecondary) {
 		transport_disconnect(gsecondary);
 		gsecondary = NULL;
 	}
+
+	return retcode;
 }
 
 static int do_fetch(struct transport *transport,
@@ -1628,8 +1630,19 @@ static int do_fetch(struct transport *transport,
 		struct ref *tags_ref_map = NULL, **tail = &tags_ref_map;
 
 		find_non_local_tags(remote_refs, &tags_ref_map, &tail);
-		if (tags_ref_map)
-			backfill_tags(transport, tags_ref_map, &fetch_head, worktrees);
+		if (tags_ref_map) {
+			/*
+			 * If backfilling tags succeeds we used to not return
+			 * an error code to the user at all. Instead, we
+			 * silently swallowed that error and updated the local
+			 * state of the repository. We now notify the user of
+			 * any such errors, but we continue to make sure that
+			 * FETCH_HEAD and the upstream branch are configured as
+			 * expected.
+			 */
+			if (backfill_tags(transport, tags_ref_map, &fetch_head, worktrees))
+				retcode = 1;
+		}
 
 		free_refs(tags_ref_map);
 	}
diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
index 888305ad4d..549f908b90 100755
--- a/t/t5503-tagfollow.sh
+++ b/t/t5503-tagfollow.sh
@@ -237,9 +237,7 @@ test_expect_success 'backfill failure causes command to fail' '
 		done
 	EOF
 
-	# Even though we fail to create refs/tags/tag1 the below command
-	# unexpectedly succeeds.
-	git -C clone5 fetch .. $B:refs/heads/something &&
+	test_must_fail git -C clone5 fetch .. $B:refs/heads/something &&
 	test $B = $(git -C clone5 rev-parse --verify refs/heads/something) &&
 	test $S = $(git -C clone5 rev-parse --verify tag2) &&
 	test_must_fail git -C clone5 rev-parse --verify tag1
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags
  2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2022-02-11  7:46 ` [PATCH 4/6] fetch: report errors when backfilling tags fails Patrick Steinhardt
@ 2022-02-11  7:47 ` Patrick Steinhardt
  2022-02-15  8:11   ` Christian Couder
  2022-02-17  1:34   ` Elijah Newren
  2022-02-11  7:47 ` [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs Patrick Steinhardt
  5 siblings, 2 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:47 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 11375 bytes --]

When fetching references from a remote we by default also fetch all tags
which point into the history we have fetched. This is a separate step
performed after updating local references because it requires us to walk
over the history on the client-side to determine whether the remote has
announced any tags which point to one of the fetched commits.

This backfilling of tags isn't covered by the `--atomic` flag: right
now, it only applies to the step where we update our local references.
This is an oversight at the time the flag was introduced: its purpose is
to either update all references or none, but right now we happily update
local references even in the case where backfilling failed.

Fix this by pulling up creation of the reference transaction such that
we can pass the same transaction to both the code which updates local
references and to the code which backfills tags. This allows us to only
commit the transaction in case both actions succeed.

Note that we also have to start passing the transaction into
`find_non_local_tags()`: this function is responsible for finding all
tags which we need to backfill. Right now, it will happily return tags
which we have already been updated with our local references. But when
we use a single transaction for both local references and backfilling
then it may happen that we try to queue the same reference update twice
to the transaction, which consequentially triggers a bug. We thus have
to skip over any tags which have already been queued. Unfortunately,
this requires us to reach into internals of the reference transaction to
access queued updates, but there is no non-internal interface right now
which would allow us to access this information.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c      | 89 ++++++++++++++++++++++++++++++--------------
 t/t5503-tagfollow.sh | 11 ++----
 2 files changed, 65 insertions(+), 35 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1eda0b68ff..348e64cf2c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -4,7 +4,7 @@
 #include "cache.h"
 #include "config.h"
 #include "repository.h"
-#include "refs.h"
+#include "refs/refs-internal.h"
 #include "refspec.h"
 #include "object-store.h"
 #include "oidset.h"
@@ -350,6 +350,7 @@ static void clear_item(struct refname_hash_entry *item)
 }
 
 static void find_non_local_tags(const struct ref *refs,
+				struct ref_transaction *transaction,
 				struct ref **head,
 				struct ref ***tail)
 {
@@ -361,12 +362,28 @@ static void find_non_local_tags(const struct ref *refs,
 	const struct ref *ref;
 	struct refname_hash_entry *item = NULL;
 	const int quick_flags = OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT;
+	int i;
 
 	refname_hash_init(&existing_refs);
 	refname_hash_init(&remote_refs);
 	create_fetch_oidset(head, &fetch_oids);
 
 	for_each_ref(add_one_refname, &existing_refs);
+
+	/*
+	 * If we already have a transaction, then we need to filter out all
+	 * tags which have already been queued up.
+	 */
+	for (i = 0; transaction && i < transaction->nr; i++) {
+		if (!starts_with(transaction->updates[i]->refname, "refs/tags/") ||
+		    !(transaction->updates[i]->flags & REF_HAVE_NEW))
+			continue;
+		(void) refname_hash_add(&existing_refs,
+					transaction->updates[i]->refname,
+					&transaction->updates[i]->new_oid);
+	}
+
+
 	for (ref = refs; ref; ref = ref->next) {
 		if (!starts_with(ref->name, "refs/tags/"))
 			continue;
@@ -600,7 +617,7 @@ static struct ref *get_ref_map(struct remote *remote,
 		/* also fetch all tags */
 		get_fetch_map(remote_refs, tag_refspec, &tail, 0);
 	else if (tags == TAGS_DEFAULT && *autotags)
-		find_non_local_tags(remote_refs, &ref_map, &tail);
+		find_non_local_tags(remote_refs, NULL, &ref_map, &tail);
 
 	/* Now append any refs to be updated opportunistically: */
 	*tail = orefs;
@@ -1083,12 +1100,12 @@ N_("it took %.2f seconds to check forced updates; you can use\n"
    "to avoid this check\n");
 
 static int store_updated_refs(const char *raw_url, const char *remote_name,
-			      int connectivity_checked, struct ref *ref_map,
+			      int connectivity_checked,
+			      struct ref_transaction *transaction, struct ref *ref_map,
 			      struct fetch_head *fetch_head, struct worktree **worktrees)
 {
 	int url_len, i, rc = 0;
 	struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
-	struct ref_transaction *transaction = NULL;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
@@ -1110,14 +1127,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (atomic_fetch) {
-		transaction = ref_transaction_begin(&err);
-		if (!transaction) {
-			error("%s", err.buf);
-			goto abort;
-		}
-	}
-
 	prepare_format_display(ref_map);
 
 	/*
@@ -1233,14 +1242,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (!rc && transaction) {
-		rc = ref_transaction_commit(transaction, &err);
-		if (rc) {
-			error("%s", err.buf);
-			goto abort;
-		}
-	}
-
 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
@@ -1258,7 +1259,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
  abort:
 	strbuf_release(&note);
 	strbuf_release(&err);
-	ref_transaction_free(transaction);
 	free(url);
 	return rc;
 }
@@ -1299,6 +1299,7 @@ static int check_exist_and_connected(struct ref *ref_map)
 }
 
 static int fetch_and_consume_refs(struct transport *transport,
+				  struct ref_transaction *transaction,
 				  struct ref *ref_map,
 				  struct fetch_head *fetch_head,
 				  struct worktree **worktrees)
@@ -1323,7 +1324,8 @@ static int fetch_and_consume_refs(struct transport *transport,
 
 	trace2_region_enter("fetch", "consume_refs", the_repository);
 	ret = store_updated_refs(transport->url, transport->remote->name,
-				 connectivity_checked, ref_map, fetch_head, worktrees);
+				 connectivity_checked, transaction, ref_map,
+				 fetch_head, worktrees);
 	trace2_region_leave("fetch", "consume_refs", the_repository);
 
 out:
@@ -1496,6 +1498,7 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 }
 
 static int backfill_tags(struct transport *transport,
+			 struct ref_transaction *transaction,
 			 struct ref *ref_map,
 			 struct fetch_head *fetch_head,
 			 struct worktree **worktrees)
@@ -1519,7 +1522,7 @@ static int backfill_tags(struct transport *transport,
 	transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
 	transport_set_option(transport, TRANS_OPT_DEPTH, "0");
 	transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-	retcode = fetch_and_consume_refs(transport, ref_map, fetch_head, worktrees);
+	retcode = fetch_and_consume_refs(transport, transaction, ref_map, fetch_head, worktrees);
 
 	if (gsecondary) {
 		transport_disconnect(gsecondary);
@@ -1532,6 +1535,7 @@ static int backfill_tags(struct transport *transport,
 static int do_fetch(struct transport *transport,
 		    struct refspec *rs)
 {
+	struct ref_transaction *transaction = NULL;
 	struct ref *ref_map = NULL;
 	int autotags = (transport->remote->fetch_tags == 1);
 	int retcode = 0;
@@ -1541,6 +1545,7 @@ static int do_fetch(struct transport *transport,
 	int must_list_refs = 1;
 	struct worktree **worktrees = get_worktrees();
 	struct fetch_head fetch_head = { 0 };
+	struct strbuf err = STRBUF_INIT;
 
 	if (tags == TAGS_DEFAULT) {
 		if (transport->remote->fetch_tags == 2)
@@ -1602,6 +1607,14 @@ static int do_fetch(struct transport *transport,
 	if (retcode)
 		goto cleanup;
 
+	if (atomic_fetch) {
+		transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			retcode = error("%s", err.buf);
+			goto cleanup;
+		}
+	}
+
 	if (tags == TAGS_DEFAULT && autotags)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
 	if (prune) {
@@ -1619,7 +1632,7 @@ static int do_fetch(struct transport *transport,
 		}
 	}
 
-	if (fetch_and_consume_refs(transport, ref_map, &fetch_head, worktrees)) {
+	if (fetch_and_consume_refs(transport, transaction, ref_map, &fetch_head, worktrees)) {
 		retcode = 1;
 		goto cleanup;
 	}
@@ -1629,7 +1642,7 @@ static int do_fetch(struct transport *transport,
 	if (tags == TAGS_DEFAULT && autotags) {
 		struct ref *tags_ref_map = NULL, **tail = &tags_ref_map;
 
-		find_non_local_tags(remote_refs, &tags_ref_map, &tail);
+		find_non_local_tags(remote_refs, transaction, &tags_ref_map, &tail);
 		if (tags_ref_map) {
 			/*
 			 * If backfilling tags succeeds we used to not return
@@ -1638,15 +1651,31 @@ static int do_fetch(struct transport *transport,
 			 * state of the repository. We now notify the user of
 			 * any such errors, but we continue to make sure that
 			 * FETCH_HEAD and the upstream branch are configured as
-			 * expected.
+			 * expected. The exception though is when `--atomic`
+			 * is passed: in that case we'll abort the transaction
+			 * and don't commit anything.
 			 */
-			if (backfill_tags(transport, tags_ref_map, &fetch_head, worktrees))
+			if (backfill_tags(transport, transaction, tags_ref_map,
+					  &fetch_head, worktrees))
 				retcode = 1;
 		}
 
 		free_refs(tags_ref_map);
 	}
 
+	if (transaction) {
+		if (retcode)
+			goto cleanup;
+
+		retcode = ref_transaction_commit(transaction, &err);
+		if (retcode) {
+			error("%s", err.buf);
+			ref_transaction_free(transaction);
+			transaction = NULL;
+			goto cleanup;
+		}
+	}
+
 	commit_fetch_head(&fetch_head);
 
 	if (set_upstream) {
@@ -1704,7 +1733,13 @@ static int do_fetch(struct transport *transport,
 	}
 
 cleanup:
+	if (retcode && transaction) {
+		ref_transaction_abort(transaction, &err);
+		error("%s", err.buf);
+	}
+
 	close_fetch_head(&fetch_head);
+	strbuf_release(&err);
 	free_refs(ref_map);
 	free_worktrees(worktrees);
 	return retcode;
diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
index 549f908b90..4a8e63aa16 100755
--- a/t/t5503-tagfollow.sh
+++ b/t/t5503-tagfollow.sh
@@ -182,11 +182,8 @@ test_expect_success 'atomic fetch with failing backfill' '
 	EOF
 
 	test_must_fail git -C clone3 fetch --atomic .. $B:refs/heads/something &&
-
-	# Creation of the tag has failed, so ideally refs/heads/something
-	# should not exist. The fact that it does is demonstrates that there is
-	# missing coverage in the `--atomic` flag.
-	test $B = "$(git -C clone3 rev-parse --verify refs/heads/something)"
+	test_must_fail git -C clone3 rev-parse --verify refs/heads/something &&
+	test_must_fail git -C clone3 rev-parse --verify refs/tags/tag2
 '
 
 test_expect_success 'atomic fetch with backfill should use single transaction' '
@@ -199,12 +196,10 @@ test_expect_success 'atomic fetch with backfill should use single transaction' '
 		prepared
 		$ZERO_OID $B refs/heads/something
 		$ZERO_OID $S refs/tags/tag2
+		$ZERO_OID $T refs/tags/tag1
 		committed
 		$ZERO_OID $B refs/heads/something
 		$ZERO_OID $S refs/tags/tag2
-		prepared
-		$ZERO_OID $T refs/tags/tag1
-		committed
 		$ZERO_OID $T refs/tags/tag1
 	EOF
 
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs
  2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2022-02-11  7:47 ` [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags Patrick Steinhardt
@ 2022-02-11  7:47 ` Patrick Steinhardt
  2022-02-15  9:12   ` Christian Couder
                     ` (2 more replies)
  5 siblings, 3 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-11  7:47 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4787 bytes --]

When fetching with the `--prune` flag we will delete any local
references matching the fetch refspec which have disappeared on the
remote. This step is not currently covered by the `--atomic` flag: we
delete branches even though updating of local references has failed,
which means that the fetch is not an all-or-nothing operation.

Fix this bug by passing in the global transaction into `prune_refs()`:
if one is given, then we'll only queue up deletions and not commit them
right away.

This change also improves performance when pruning many branches in a
repository with a big packed-refs file: every references is pruned in
its own transaction, which means that we potentially have to rewrite
the packed-refs files for every single reference we're about to prune.

The following benchmark demonstrates this: it performs a pruning fetch
from a repository with a single reference into a repository with 100k
references, which causes us to prune all but one reference. This is of
course a very artificial setup, but serves to demonstrate the impact of
only having to write the packed-refs file once:

    Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
      Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
      Range (min … max):    2.328 s …  2.407 s    10 runs

    Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
      Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
      Range (min … max):    1.346 s …  1.400 s    10 runs

    Summary
      'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
        1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c  | 30 ++++++++++++++++++++++--------
 t/t5510-fetch.sh |  4 +---
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 348e64cf2c..75e791a4b4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1333,11 +1333,14 @@ static int fetch_and_consume_refs(struct transport *transport,
 	return ret;
 }
 
-static int prune_refs(struct refspec *rs, struct ref *ref_map,
+static int prune_refs(struct refspec *rs,
+		      struct ref_transaction *transaction,
+		      struct ref *ref_map,
 		      const char *raw_url)
 {
 	int url_len, i, result = 0;
 	struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
+	struct strbuf err = STRBUF_INIT;
 	char *url;
 	int summary_width = transport_summary_width(stale_refs);
 	const char *dangling_msg = dry_run
@@ -1358,13 +1361,22 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
 		url_len = i - 3;
 
 	if (!dry_run) {
-		struct string_list refnames = STRING_LIST_INIT_NODUP;
+		if (transaction) {
+			for (ref = stale_refs; ref; ref = ref->next) {
+				result = ref_transaction_delete(transaction, ref->name, NULL, 0,
+								"fetch: prune", &err);
+				if (result)
+					goto cleanup;
+			}
+		} else {
+			struct string_list refnames = STRING_LIST_INIT_NODUP;
 
-		for (ref = stale_refs; ref; ref = ref->next)
-			string_list_append(&refnames, ref->name);
+			for (ref = stale_refs; ref; ref = ref->next)
+				string_list_append(&refnames, ref->name);
 
-		result = delete_refs("fetch: prune", &refnames, 0);
-		string_list_clear(&refnames, 0);
+			result = delete_refs("fetch: prune", &refnames, 0);
+			string_list_clear(&refnames, 0);
+		}
 	}
 
 	if (verbosity >= 0) {
@@ -1383,6 +1395,8 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
 		}
 	}
 
+cleanup:
+	strbuf_release(&err);
 	free(url);
 	free_refs(stale_refs);
 	return result;
@@ -1624,10 +1638,10 @@ static int do_fetch(struct transport *transport,
 		 * don't care whether --tags was specified.
 		 */
 		if (rs->nr) {
-			prune_refs(rs, ref_map, transport->url);
+			prune_refs(rs, transaction, ref_map, transport->url);
 		} else {
 			prune_refs(&transport->remote->fetch,
-				   ref_map,
+				   transaction, ref_map,
 				   transport->url);
 		}
 	}
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 93a0db3c68..afa6bf9f7d 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
 	cat >expected <<-EOF &&
 		prepared
 		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
-		committed
-		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
-		prepared
 		$ZERO_OID $head_oid refs/remotes/origin/new-branch
 		committed
+		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
 		$ZERO_OID $head_oid refs/remotes/origin/new-branch
 	EOF
 
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] fetch: increase test coverage of fetches
  2022-02-11  7:46 ` [PATCH 1/6] fetch: increase test coverage of fetches Patrick Steinhardt
@ 2022-02-15  6:19   ` Christian Couder
  2022-02-17 11:13     ` Patrick Steinhardt
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Couder @ 2022-02-15  6:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Feb 11, 2022 at 8:38 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> The `--atomic` flag is missing test coverage for pruning of deleted
> references and backfilling of tags, and of course both aren't covered
> correctly by this flag.

It's not clear to me what "both aren't covered correctly by this flag"
actually means here. If it means that pruning of deleted references
and backfilling of tags don't work correctly when --atomic is used,
then it could be stated more clearly. Otherwise this seems to just be
repeating the first part of the sentence.

> Furthermore, we don't have tests demonstrating
> error cases for backfilling tags.
>
> Add tests to cover those testing gaps.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>

> +test_expect_success 'atomic fetch with failing backfill' '
> +       git init clone3 &&
> +
> +       # We want to test whether a failure when backfilling tags correctly
> +       # aborts the complete transaction when `--atomic` is passed: we should
> +       # neither create the branch nor should we create the tag when either
> +       # one of both fails to update correctly.
> +       #
> +       # To trigger failure we simply abort when backfilling a tag.
> +       write_script clone3/.git/hooks/reference-transaction <<-\EOF &&
> +               #!/bin/sh
> +
> +               while read oldrev newrev reference
> +               do
> +                       if test "$reference" = refs/tags/tag1
> +                       then
> +                               exit 1
> +                       fi
> +               done
> +       EOF
> +
> +       test_must_fail git -C clone3 fetch --atomic .. $B:refs/heads/something &&
> +
> +       # Creation of the tag has failed, so ideally refs/heads/something
> +       # should not exist. The fact that it does is demonstrates that there is

s/The fact that it does is demonstrates/The fact that it does demonstrates/

> +       # missing coverage in the `--atomic` flag.

Maybe s/missing coverage/a bug/ would make things clearer.

> +       test $B = "$(git -C clone3 rev-parse --verify refs/heads/something)"
> +'

As this patch series is about fixing buggy parts of the behavior with
--atomic, I think it would make more sense to use test_expect_failure,
instead of test_expect_success, in this test, and to check that we
have the correct behavior, instead of checking that we have the buggy
behavior.

Of course when later in this patch series the buggy behavior is fixed,
then test_expect_failure should be replaced with test_expect_success.

> +test_expect_success 'atomic fetch with backfill should use single transaction' '
> +       git init clone4 &&
> +
> +       # Fetching with the `--atomic` flag should update all references in a
> +       # single transaction, including backfilled tags. We thus expect to see
> +       # a single reference transaction for the created branch and tags.
> +       cat >expected <<-EOF &&
> +               prepared
> +               $ZERO_OID $B refs/heads/something
> +               $ZERO_OID $S refs/tags/tag2
> +               committed
> +               $ZERO_OID $B refs/heads/something
> +               $ZERO_OID $S refs/tags/tag2
> +               prepared
> +               $ZERO_OID $T refs/tags/tag1
> +               committed
> +               $ZERO_OID $T refs/tags/tag1
> +       EOF

The comment says that we expect to see a single reference transaction,
but the expected file we create seems to show 2 transactions. So I
think here too, we should use test_expect_failure, instead of
test_expect_success, and check that we have the correct behavior
instead of a buggy one.

> +       write_script clone4/.git/hooks/reference-transaction <<-\EOF &&

Here there is no #!/bin/sh while other uses of write_script in your
patch have it. If it's not necessary, it could be removed in the other
uses.

> +               ( echo "$*" && cat ) >>actual
> +       EOF
> +
> +       git -C clone4 fetch --atomic .. $B:refs/heads/something &&
> +       test_cmp expected clone4/actual
> +'

I took a quick look at the 2 other tests after this one, and I think
test_expect_failure should be used there too, instead of
test_expect_success.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] fetch: backfill tags before setting upstream
  2022-02-11  7:46 ` [PATCH 2/6] fetch: backfill tags before setting upstream Patrick Steinhardt
@ 2022-02-15  6:43   ` Christian Couder
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Couder @ 2022-02-15  6:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Feb 11, 2022 at 5:12 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> The fetch code flow is a bit hard to understand right now:
>
>     1. We optionally prune all references which have vanished on the
>        remote side.
>     2. We fetch and update all other references locally.
>     3. We update the upstream branch in the gitconfig.
>     4. We backfill tags pointing into the history we have just fetched.
>
> It is quite confusing that we fetch objects and update references in
> both (2) and (4), which is further stressed by the point that we require

s/require/use/

> a `skip` label which jumps from (3) to (4) in case we fail to update the

s/`skip` label/`skip` goto label/
s/which jumps/to jump/

> gitconfig as expected.
>
> Reorder the code to first update all local references, and only after we
> have done so update the upstream branch information. This improves the
> code flow and furthermore makes it easier to refactor the way we update
> references together.

> @@ -1536,7 +1536,7 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map,
>  static int do_fetch(struct transport *transport,
>                     struct refspec *rs)
>  {
> -       struct ref *ref_map;
> +       struct ref *ref_map = NULL;
>         int autotags = (transport->remote->fetch_tags == 1);
>         int retcode = 0;
>         const struct ref *remote_refs;
> @@ -1618,11 +1618,22 @@ static int do_fetch(struct transport *transport,
>                 }
>         }
>         if (fetch_and_consume_refs(transport, ref_map, worktrees)) {
> -               free_refs(ref_map);
>                 retcode = 1;
>                 goto cleanup;
>         }
>
> +       /* if neither --no-tags nor --tags was specified, do automated tag
> +        * following ... */

Maybe while at it this could be changed to use our usual style for
multi-line comments:

       /*
        * If neither --no-tags nor --tags was specified, do automated tag
        * following...
        */

> +       if (tags == TAGS_DEFAULT && autotags) {
> +               struct ref *tags_ref_map = NULL, **tail = &tags_ref_map;
> +
> +               find_non_local_tags(remote_refs, &tags_ref_map, &tail);
> +               if (tags_ref_map)
> +                       backfill_tags(transport, tags_ref_map, worktrees);
> +
> +               free_refs(tags_ref_map);
> +       }

> @@ -1676,21 +1687,9 @@ static int do_fetch(struct transport *transport,
>                                   "you need to specify exactly one branch with the --set-upstream option"));
>                 }
>         }
> -skip:

I like that it's removing one goto label and making the code simpler.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place
  2022-02-11  7:46 ` [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place Patrick Steinhardt
@ 2022-02-15  7:32   ` Christian Couder
  2022-02-17 11:18     ` Patrick Steinhardt
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Couder @ 2022-02-15  7:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Sat, Feb 12, 2022 at 5:49 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> There are two different locations where we're appending to FETCH_HEAD:
> first when storing updated references, and second when backfilling tags.
> Both times we open the file, append to it and then commit it into place,
> which is essentially duplicate work.
>
> Improve the lifecycle of updating FETCH_HEAD by opening and committing
> it once in `do_fetch()`, where we pass the structure down to code which

s/down to code/down to the code/

> wants to append to it.

> @@ -1601,6 +1596,10 @@ static int do_fetch(struct transport *transport,
>         if (!update_head_ok)
>                 check_not_current_branch(ref_map, worktrees);
>
> +       retcode = open_fetch_head(&fetch_head);
> +       if (retcode)
> +               goto cleanup;
> +
>         if (tags == TAGS_DEFAULT && autotags)
>                 transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
>         if (prune) {
> @@ -1617,7 +1616,8 @@ static int do_fetch(struct transport *transport,
>                                    transport->url);
>                 }
>         }
> -       if (fetch_and_consume_refs(transport, ref_map, worktrees)) {
> +
> +       if (fetch_and_consume_refs(transport, ref_map, &fetch_head, worktrees)) {
>                 retcode = 1;
>                 goto cleanup;
>         }

I think the following (if it works) would be more consistent with
what's done for open_fetch_head() above:

retcode = fetch_and_consume_refs(transport, ref_map, &fetch_head, worktrees)
if (retcode)
      goto cleanup;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] fetch: report errors when backfilling tags fails
  2022-02-11  7:46 ` [PATCH 4/6] fetch: report errors when backfilling tags fails Patrick Steinhardt
@ 2022-02-15  7:52   ` Christian Couder
  2022-02-17 11:27     ` Patrick Steinhardt
  2022-02-16 23:35   ` Jonathan Tan
  2022-02-17  1:31   ` Elijah Newren
  2 siblings, 1 reply; 27+ messages in thread
From: Christian Couder @ 2022-02-15  7:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Feb 11, 2022 at 9:03 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> When the backfilling of tags fails we do not report this error to the
> caller, but only report it implicitly at a later point when reporting
> updated references.

Probably stupid question: are we sure that it's a bug and not a feature?

> This leaves callers unable to act upon the
> information of whether the backfilling succeeded or not.
>
> Refactor the function to return an error code and pass it up the
> callstack. This causes us to correctly propagate the error back to the
> user of git-fetch(1).

Even if it would have been the right behavior when backfilling tags
was implemented to return an error when backfilling tags fails, I
think it's interesting to ask ourselves if this change could be seen
as a regression by some users.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags
  2022-02-11  7:47 ` [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags Patrick Steinhardt
@ 2022-02-15  8:11   ` Christian Couder
  2022-02-16 23:41     ` Jonathan Tan
  2022-02-17 11:37     ` Patrick Steinhardt
  2022-02-17  1:34   ` Elijah Newren
  1 sibling, 2 replies; 27+ messages in thread
From: Christian Couder @ 2022-02-15  8:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Mon, Feb 14, 2022 at 10:13 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> When fetching references from a remote we by default also fetch all tags
> which point into the history we have fetched. This is a separate step
> performed after updating local references because it requires us to walk
> over the history on the client-side to determine whether the remote has
> announced any tags which point to one of the fetched commits.
>
> This backfilling of tags isn't covered by the `--atomic` flag: right
> now, it only applies to the step where we update our local references.
> This is an oversight at the time the flag was introduced: its purpose is
> to either update all references or none, but right now we happily update
> local references even in the case where backfilling failed.

Also it looks like the backfilling of tags itself isn't atomic, right?
Some tags could be backfilled while others aren't.

> Fix this by pulling up creation of the reference transaction such that
> we can pass the same transaction to both the code which updates local
> references and to the code which backfills tags. This allows us to only
> commit the transaction in case both actions succeed.

Maybe this could be seen as a regression by users who are mostly
interested in the local references though.

> Note that we also have to start passing the transaction into
> `find_non_local_tags()`: this function is responsible for finding all
> tags which we need to backfill. Right now, it will happily return tags
> which we have already been updated with our local references. But when

s/we have/have/

> we use a single transaction for both local references and backfilling
> then it may happen that we try to queue the same reference update twice
> to the transaction, which consequentially triggers a bug. We thus have

s/consequentially/consequently/

> to skip over any tags which have already been queued. Unfortunately,
> this requires us to reach into internals of the reference transaction to
> access queued updates, but there is no non-internal interface right now
> which would allow us to access this information.

This makes me wonder if such a non-internal interface should be
implemented first. Or if some function to queue a reference update
could check if the same reference update has already been queued.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs
  2022-02-11  7:47 ` [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs Patrick Steinhardt
@ 2022-02-15  9:12   ` Christian Couder
  2022-02-17 12:03     ` Patrick Steinhardt
  2022-02-16 23:39   ` Jonathan Tan
  2022-02-17  1:40   ` Elijah Newren
  2 siblings, 1 reply; 27+ messages in thread
From: Christian Couder @ 2022-02-15  9:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Feb 11, 2022 at 9:25 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When fetching with the `--prune` flag we will delete any local
> references matching the fetch refspec which have disappeared on the
> remote. This step is not currently covered by the `--atomic` flag: we
> delete branches even though updating of local references has failed,
> which means that the fetch is not an all-or-nothing operation.

It could perhaps be seen as a regression by some users though, if
updating of local references doesn't work anymore when deleting a
local reference matching the fetch refspec fails.

> Fix this bug by passing in the global transaction into `prune_refs()`:
> if one is given, then we'll only queue up deletions and not commit them
> right away.
>
> This change also improves performance when pruning many branches in a
> repository with a big packed-refs file: every references is pruned in
> its own transaction, which means that we potentially have to rewrite
> the packed-refs files for every single reference we're about to prune.

Yeah, I wonder if there could be a performance improvement in the
previous patch too as it looks like tag backfilling wasn't atomic too.

> The following benchmark demonstrates this: it performs a pruning fetch
> from a repository with a single reference into a repository with 100k
> references, which causes us to prune all but one reference. This is of
> course a very artificial setup, but serves to demonstrate the impact of
> only having to write the packed-refs file once:
>
>     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
>       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
>       Range (min … max):    2.328 s …  2.407 s    10 runs
>
>     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
>       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
>       Range (min … max):    1.346 s …  1.400 s    10 runs
>
>     Summary
>       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
>         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

Nice!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] fetch: report errors when backfilling tags fails
  2022-02-11  7:46 ` [PATCH 4/6] fetch: report errors when backfilling tags fails Patrick Steinhardt
  2022-02-15  7:52   ` Christian Couder
@ 2022-02-16 23:35   ` Jonathan Tan
  2022-02-17  1:31   ` Elijah Newren
  2 siblings, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2022-02-16 23:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jonathan Tan, git

Patrick Steinhardt <ps@pks.im> writes:
> @@ -1628,8 +1630,19 @@ static int do_fetch(struct transport *transport,
>  		struct ref *tags_ref_map = NULL, **tail = &tags_ref_map;
>  
>  		find_non_local_tags(remote_refs, &tags_ref_map, &tail);
> -		if (tags_ref_map)
> -			backfill_tags(transport, tags_ref_map, &fetch_head, worktrees);
> +		if (tags_ref_map) {
> +			/*
> +			 * If backfilling tags succeeds we used to not return
> +			 * an error code to the user at all. Instead, we
> +			 * silently swallowed that error and updated the local
> +			 * state of the repository. We now notify the user of
> +			 * any such errors, but we continue to make sure that
> +			 * FETCH_HEAD and the upstream branch are configured as
> +			 * expected.
> +			 */
> +			if (backfill_tags(transport, tags_ref_map, &fetch_head, worktrees))
> +				retcode = 1;

I think s/succeeds/fails/

Having said that, I don't think that we need to describe the past here.
Notifying the user if something has failed is expected and doesn't
require an explanation about how we didn't do that in the past.

Everything up to this patch looks good to me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs
  2022-02-11  7:47 ` [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs Patrick Steinhardt
  2022-02-15  9:12   ` Christian Couder
@ 2022-02-16 23:39   ` Jonathan Tan
  2022-02-17  1:40   ` Elijah Newren
  2 siblings, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2022-02-16 23:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jonathan Tan, git

Patrick Steinhardt <ps@pks.im> writes:
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index 93a0db3c68..afa6bf9f7d 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
>  	cat >expected <<-EOF &&
>  		prepared
>  		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -		committed
> -		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -		prepared
>  		$ZERO_OID $head_oid refs/remotes/origin/new-branch
>  		committed
> +		$ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
>  		$ZERO_OID $head_oid refs/remotes/origin/new-branch
>  	EOF

I think that there is a comment above this change that needs to be
deleted too.

Overall, I think that this patch set is a good change, and I have no
further comments. (Regarding patch 5, I can't think of a better way to
know what refs are duplicates.)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags
  2022-02-15  8:11   ` Christian Couder
@ 2022-02-16 23:41     ` Jonathan Tan
  2022-02-17 11:37     ` Patrick Steinhardt
  1 sibling, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2022-02-16 23:41 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jonathan Tan, Patrick Steinhardt, git

Christian Couder <christian.couder@gmail.com> writes:
> > Fix this by pulling up creation of the reference transaction such that
> > we can pass the same transaction to both the code which updates local
> > references and to the code which backfills tags. This allows us to only
> > commit the transaction in case both actions succeed.
> 
> Maybe this could be seen as a regression by users who are mostly
> interested in the local references though.

I don't think we need to worry about this - as far as I know,
backfilling of tags only happens when the server doesn't support
include-tag, which was introduced in 2009 (and in my experience, they
all do). And in the rare case that backfilling happens, I think the user
would want to have atomicity in the also rare case that the first fetch
succeeds and the second fails.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] fetch: report errors when backfilling tags fails
  2022-02-11  7:46 ` [PATCH 4/6] fetch: report errors when backfilling tags fails Patrick Steinhardt
  2022-02-15  7:52   ` Christian Couder
  2022-02-16 23:35   ` Jonathan Tan
@ 2022-02-17  1:31   ` Elijah Newren
  2 siblings, 0 replies; 27+ messages in thread
From: Elijah Newren @ 2022-02-17  1:31 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Git Mailing List

On Fri, Feb 11, 2022 at 12:03 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> When the backfilling of tags fails we do not report this error to the
> caller, but only report it implicitly at a later point when reporting
> updated references. This leaves callers unable to act upon the
> information of whether the backfilling succeeded or not.
>
> Refactor the function to return an error code and pass it up the
> callstack. This causes us to correctly propagate the error back to the
> user of git-fetch(1).
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/fetch.c      | 29 +++++++++++++++++++++--------
>  t/t5503-tagfollow.sh |  4 +---
>  2 files changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 627847e2f8..1eda0b68ff 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1495,12 +1495,12 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
>         return transport;
>  }
>
> -static void backfill_tags(struct transport *transport,
> -                         struct ref *ref_map,
> -                         struct fetch_head *fetch_head,
> -                         struct worktree **worktrees)
> +static int backfill_tags(struct transport *transport,
> +                        struct ref *ref_map,
> +                        struct fetch_head *fetch_head,
> +                        struct worktree **worktrees)
>  {
> -       int cannot_reuse;
> +       int retcode, cannot_reuse;
>
>         /*
>          * Once we have set TRANS_OPT_DEEPEN_SINCE, we can't unset it
> @@ -1519,12 +1519,14 @@ static void backfill_tags(struct transport *transport,
>         transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
>         transport_set_option(transport, TRANS_OPT_DEPTH, "0");
>         transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
> -       fetch_and_consume_refs(transport, ref_map, fetch_head, worktrees);
> +       retcode = fetch_and_consume_refs(transport, ref_map, fetch_head, worktrees);
>
>         if (gsecondary) {
>                 transport_disconnect(gsecondary);
>                 gsecondary = NULL;
>         }
> +
> +       return retcode;
>  }
>
>  static int do_fetch(struct transport *transport,
> @@ -1628,8 +1630,19 @@ static int do_fetch(struct transport *transport,
>                 struct ref *tags_ref_map = NULL, **tail = &tags_ref_map;
>
>                 find_non_local_tags(remote_refs, &tags_ref_map, &tail);
> -               if (tags_ref_map)
> -                       backfill_tags(transport, tags_ref_map, &fetch_head, worktrees);
> +               if (tags_ref_map) {
> +                       /*
> +                        * If backfilling tags succeeds we used to not return
> +                        * an error code to the user at all. Instead, we
> +                        * silently swallowed that error and updated the local
> +                        * state of the repository. We now notify the user of
> +                        * any such errors, but we continue to make sure that
> +                        * FETCH_HEAD and the upstream branch are configured as
> +                        * expected.
> +                        */

nit: I'd prefer to see code comments explain what we currently do, and
move history lessons to the commit message.


> +                       if (backfill_tags(transport, tags_ref_map, &fetch_head, worktrees))
> +                               retcode = 1;
> +               }
>
>                 free_refs(tags_ref_map);
>         }
> diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
> index 888305ad4d..549f908b90 100755
> --- a/t/t5503-tagfollow.sh
> +++ b/t/t5503-tagfollow.sh
> @@ -237,9 +237,7 @@ test_expect_success 'backfill failure causes command to fail' '
>                 done
>         EOF
>
> -       # Even though we fail to create refs/tags/tag1 the below command
> -       # unexpectedly succeeds.
> -       git -C clone5 fetch .. $B:refs/heads/something &&
> +       test_must_fail git -C clone5 fetch .. $B:refs/heads/something &&
>         test $B = $(git -C clone5 rev-parse --verify refs/heads/something) &&
>         test $S = $(git -C clone5 rev-parse --verify tag2) &&
>         test_must_fail git -C clone5 rev-parse --verify tag1
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags
  2022-02-11  7:47 ` [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags Patrick Steinhardt
  2022-02-15  8:11   ` Christian Couder
@ 2022-02-17  1:34   ` Elijah Newren
  2022-02-17 11:58     ` Patrick Steinhardt
  1 sibling, 1 reply; 27+ messages in thread
From: Elijah Newren @ 2022-02-17  1:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Git Mailing List

On Mon, Feb 14, 2022 at 1:32 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When fetching references from a remote we by default also fetch all tags
> which point into the history we have fetched. This is a separate step
> performed after updating local references because it requires us to walk
> over the history on the client-side to determine whether the remote has
> announced any tags which point to one of the fetched commits.
>
> This backfilling of tags isn't covered by the `--atomic` flag: right
> now, it only applies to the step where we update our local references.
> This is an oversight at the time the flag was introduced: its purpose is
> to either update all references or none, but right now we happily update
> local references even in the case where backfilling failed.
>
> Fix this by pulling up creation of the reference transaction such that
> we can pass the same transaction to both the code which updates local
> references and to the code which backfills tags. This allows us to only
> commit the transaction in case both actions succeed.
>
> Note that we also have to start passing the transaction into
> `find_non_local_tags()`: this function is responsible for finding all
> tags which we need to backfill. Right now, it will happily return tags
> which we have already been updated with our local references. But when
> we use a single transaction for both local references and backfilling
> then it may happen that we try to queue the same reference update twice
> to the transaction, which consequentially triggers a bug. We thus have
> to skip over any tags which have already been queued. Unfortunately,
> this requires us to reach into internals of the reference transaction to
> access queued updates, but there is no non-internal interface right now
> which would allow us to access this information.

I like the changes you are making here in general, but I do agree that
reaching into refs-internal feels a bit icky.  I'm not familiar with
the refs API nor the fetching code, so feel free to ignore these
ideas, but I'm just throwing them out there as possibilities to avoid
reaching into refs-internal:

  - you are trying to check for existing transactions to avoid
duplicates, but those existing transactions came from elsewhere in the
same code we control.  Could we store a strset or strmap of the items
being updated (in addition to storing them in the transaction), and
then use the strset/strmap to filter out which tags we need to
backfill?  Or would that require plumbing an extra variable through an
awful lot of callers to get the information into the right places?

  - would it make sense to add a flag to the transaction API to allow
duplicates if both updates update the ref to the same value?  (I'm
guessing you're updating to the same value, right?)

  - should we just add something to the refs API along the lines of
"transaction_includes_update_for()" or something like that?

[...]
> @@ -361,12 +362,28 @@ static void find_non_local_tags(const struct ref *refs,
>         const struct ref *ref;
>         struct refname_hash_entry *item = NULL;
>         const int quick_flags = OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT;
> +       int i;
>
>         refname_hash_init(&existing_refs);
>         refname_hash_init(&remote_refs);
>         create_fetch_oidset(head, &fetch_oids);
>
>         for_each_ref(add_one_refname, &existing_refs);
> +
> +       /*
> +        * If we already have a transaction, then we need to filter out all
> +        * tags which have already been queued up.
> +        */
> +       for (i = 0; transaction && i < transaction->nr; i++) {
> +               if (!starts_with(transaction->updates[i]->refname, "refs/tags/") ||
> +                   !(transaction->updates[i]->flags & REF_HAVE_NEW))
> +                       continue;
> +               (void) refname_hash_add(&existing_refs,
> +                                       transaction->updates[i]->refname,
> +                                       &transaction->updates[i]->new_oid);

Why the typecast here?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs
  2022-02-11  7:47 ` [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs Patrick Steinhardt
  2022-02-15  9:12   ` Christian Couder
  2022-02-16 23:39   ` Jonathan Tan
@ 2022-02-17  1:40   ` Elijah Newren
  2022-02-17 12:06     ` Patrick Steinhardt
  2 siblings, 1 reply; 27+ messages in thread
From: Elijah Newren @ 2022-02-17  1:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Git Mailing List

On Fri, Feb 11, 2022 at 12:25 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> When fetching with the `--prune` flag we will delete any local
> references matching the fetch refspec which have disappeared on the
> remote. This step is not currently covered by the `--atomic` flag: we
> delete branches even though updating of local references has failed,
> which means that the fetch is not an all-or-nothing operation.
>
> Fix this bug by passing in the global transaction into `prune_refs()`:
> if one is given, then we'll only queue up deletions and not commit them
> right away.
>
> This change also improves performance when pruning many branches in a
> repository with a big packed-refs file: every references is pruned in
> its own transaction, which means that we potentially have to rewrite
> the packed-refs files for every single reference we're about to prune.
>
> The following benchmark demonstrates this: it performs a pruning fetch
> from a repository with a single reference into a repository with 100k
> references, which causes us to prune all but one reference. This is of
> course a very artificial setup, but serves to demonstrate the impact of
> only having to write the packed-refs file once:
>
>     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
>       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
>       Range (min … max):    2.328 s …  2.407 s    10 runs
>
>     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
>       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
>       Range (min … max):    1.346 s …  1.400 s    10 runs
>
>     Summary
>       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
>         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

Very nice!

[...]
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index 93a0db3c68..afa6bf9f7d 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
>         cat >expected <<-EOF &&
>                 prepared
>                 $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -               committed
> -               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> -               prepared
>                 $ZERO_OID $head_oid refs/remotes/origin/new-branch
>                 committed

Up to here this is just what I expected.

> +               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
>                 $ZERO_OID $head_oid refs/remotes/origin/new-branch

But now scheduled-for-deletion and new-branch are both listed again
even with your fixes?  Is this some peculiarity of the reference
transaction hook that it lists the refs again after the
prepared...committed block?  (It may well be; I'm not that familiar
with this area of the code.)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] fetch: increase test coverage of fetches
  2022-02-15  6:19   ` Christian Couder
@ 2022-02-17 11:13     ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 11:13 UTC (permalink / raw)
  To: Christian Couder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 5303 bytes --]

On Tue, Feb 15, 2022 at 07:19:19AM +0100, Christian Couder wrote:
> On Fri, Feb 11, 2022 at 8:38 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > The `--atomic` flag is missing test coverage for pruning of deleted
> > references and backfilling of tags, and of course both aren't covered
> > correctly by this flag.
> 
> It's not clear to me what "both aren't covered correctly by this flag"
> actually means here. If it means that pruning of deleted references
> and backfilling of tags don't work correctly when --atomic is used,
> then it could be stated more clearly. Otherwise this seems to just be
> repeating the first part of the sentence.

Yeah, the commit message was a bit lazy to be honest. I've reworded it
to hopefully make clearer what one is looking at.

> > Furthermore, we don't have tests demonstrating
> > error cases for backfilling tags.
> >
> > Add tests to cover those testing gaps.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> 
> > +test_expect_success 'atomic fetch with failing backfill' '
> > +       git init clone3 &&
> > +
> > +       # We want to test whether a failure when backfilling tags correctly
> > +       # aborts the complete transaction when `--atomic` is passed: we should
> > +       # neither create the branch nor should we create the tag when either
> > +       # one of both fails to update correctly.
> > +       #
> > +       # To trigger failure we simply abort when backfilling a tag.
> > +       write_script clone3/.git/hooks/reference-transaction <<-\EOF &&
> > +               #!/bin/sh
> > +
> > +               while read oldrev newrev reference
> > +               do
> > +                       if test "$reference" = refs/tags/tag1
> > +                       then
> > +                               exit 1
> > +                       fi
> > +               done
> > +       EOF
> > +
> > +       test_must_fail git -C clone3 fetch --atomic .. $B:refs/heads/something &&
> > +
> > +       # Creation of the tag has failed, so ideally refs/heads/something
> > +       # should not exist. The fact that it does is demonstrates that there is
> 
> s/The fact that it does is demonstrates/The fact that it does demonstrates/
> 
> > +       # missing coverage in the `--atomic` flag.
> 
> Maybe s/missing coverage/a bug/ would make things clearer.
> 
> > +       test $B = "$(git -C clone3 rev-parse --verify refs/heads/something)"
> > +'
> 
> As this patch series is about fixing buggy parts of the behavior with
> --atomic, I think it would make more sense to use test_expect_failure,
> instead of test_expect_success, in this test, and to check that we
> have the correct behavior, instead of checking that we have the buggy
> behavior.
> 
> Of course when later in this patch series the buggy behavior is fixed,
> then test_expect_failure should be replaced with test_expect_success.

The downside of using `test_expect_failure` is that one cannot easily
see what exactly is broken in the testcase. Yes, you can document it,
but when using `test_expect_success` the huge advantage is that you can
see exactly what behaviour is changing in subsequent commits by just having a look
at the diff of the test which adapts it from its initially-broken state
to the newly-fixed behaviour.

> > +test_expect_success 'atomic fetch with backfill should use single transaction' '
> > +       git init clone4 &&
> > +
> > +       # Fetching with the `--atomic` flag should update all references in a
> > +       # single transaction, including backfilled tags. We thus expect to see
> > +       # a single reference transaction for the created branch and tags.
> > +       cat >expected <<-EOF &&
> > +               prepared
> > +               $ZERO_OID $B refs/heads/something
> > +               $ZERO_OID $S refs/tags/tag2
> > +               committed
> > +               $ZERO_OID $B refs/heads/something
> > +               $ZERO_OID $S refs/tags/tag2
> > +               prepared
> > +               $ZERO_OID $T refs/tags/tag1
> > +               committed
> > +               $ZERO_OID $T refs/tags/tag1
> > +       EOF
> 
> The comment says that we expect to see a single reference transaction,
> but the expected file we create seems to show 2 transactions. So I
> think here too, we should use test_expect_failure, instead of
> test_expect_success, and check that we have the correct behavior
> instead of a buggy one.

Same comment as above. I've also amended the commit message to say why
we're introducing the tests like this.

> > +       write_script clone4/.git/hooks/reference-transaction <<-\EOF &&
> 
> Here there is no #!/bin/sh while other uses of write_script in your
> patch have it. If it's not necessary, it could be removed in the other
> uses.

Good point, I always forget that the shebang is added automatically by
this helper.

> > +               ( echo "$*" && cat ) >>actual
> > +       EOF
> > +
> > +       git -C clone4 fetch --atomic .. $B:refs/heads/something &&
> > +       test_cmp expected clone4/actual
> > +'
> 
> I took a quick look at the 2 other tests after this one, and I think
> test_expect_failure should be used there too, instead of
> test_expect_success.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place
  2022-02-15  7:32   ` Christian Couder
@ 2022-02-17 11:18     ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 11:18 UTC (permalink / raw)
  To: Christian Couder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]

On Tue, Feb 15, 2022 at 08:32:56AM +0100, Christian Couder wrote:
> On Sat, Feb 12, 2022 at 5:49 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > There are two different locations where we're appending to FETCH_HEAD:
> > first when storing updated references, and second when backfilling tags.
> > Both times we open the file, append to it and then commit it into place,
> > which is essentially duplicate work.
> >
> > Improve the lifecycle of updating FETCH_HEAD by opening and committing
> > it once in `do_fetch()`, where we pass the structure down to code which
> 
> s/down to code/down to the code/
> 
> > wants to append to it.
> 
> > @@ -1601,6 +1596,10 @@ static int do_fetch(struct transport *transport,
> >         if (!update_head_ok)
> >                 check_not_current_branch(ref_map, worktrees);
> >
> > +       retcode = open_fetch_head(&fetch_head);
> > +       if (retcode)
> > +               goto cleanup;
> > +
> >         if (tags == TAGS_DEFAULT && autotags)
> >                 transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
> >         if (prune) {
> > @@ -1617,7 +1616,8 @@ static int do_fetch(struct transport *transport,
> >                                    transport->url);
> >                 }
> >         }
> > -       if (fetch_and_consume_refs(transport, ref_map, worktrees)) {
> > +
> > +       if (fetch_and_consume_refs(transport, ref_map, &fetch_head, worktrees)) {
> >                 retcode = 1;
> >                 goto cleanup;
> >         }
> 
> I think the following (if it works) would be more consistent with
> what's done for open_fetch_head() above:
> 
> retcode = fetch_and_consume_refs(transport, ref_map, &fetch_head, worktrees)
> if (retcode)
>       goto cleanup;

The code here is really quite fragile, and I'm not much of a fan how we
need to retain `retcode` across the calls. But we can't change it like
you propose because the preceding call to `prune_refs()` may have
already set `retcode = 1`, and we must not clobber that value in case
the fetch succeeds. The effect is thus that we have `retcode == 1` if
either `prune_refs()` or `fetch_and_consume_refs()` fails.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] fetch: report errors when backfilling tags fails
  2022-02-15  7:52   ` Christian Couder
@ 2022-02-17 11:27     ` Patrick Steinhardt
  2022-02-17 12:47       ` Patrick Steinhardt
  0 siblings, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 11:27 UTC (permalink / raw)
  To: Christian Couder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Tue, Feb 15, 2022 at 08:52:14AM +0100, Christian Couder wrote:
> On Fri, Feb 11, 2022 at 9:03 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When the backfilling of tags fails we do not report this error to the
> > caller, but only report it implicitly at a later point when reporting
> > updated references.
> 
> Probably stupid question: are we sure that it's a bug and not a feature?

Good question, and I don't have a definitive answer for it. But to me it
very much smells like a bug: if I ask for a fetch and the fetch fails to
populate some of the data I have asked for, then I want to get a
notification on that failure.

> > This leaves callers unable to act upon the
> > information of whether the backfilling succeeded or not.
> >
> > Refactor the function to return an error code and pass it up the
> > callstack. This causes us to correctly propagate the error back to the
> > user of git-fetch(1).
> 
> Even if it would have been the right behavior when backfilling tags
> was implemented to return an error when backfilling tags fails, I
> think it's interesting to ask ourselves if this change could be seen
> as a regression by some users.

Yeah, it's not all that clear-cut because auto-following of tags is a
bit obscure. But our default behaviour is to fetch tags pointing into
the history, and if a user didn't want that they should've passed
`--no-tags` to git-fetch(1). So conversely, we should assume that a user
is asking for auto-filling of tags if we're not told otherwise, which
also means that it is a failure if this fails.

At least that's my take, but I'm happy to hear arguments against my
viewpoint.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags
  2022-02-15  8:11   ` Christian Couder
  2022-02-16 23:41     ` Jonathan Tan
@ 2022-02-17 11:37     ` Patrick Steinhardt
  1 sibling, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 11:37 UTC (permalink / raw)
  To: Christian Couder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3185 bytes --]

On Tue, Feb 15, 2022 at 09:11:55AM +0100, Christian Couder wrote:
> On Mon, Feb 14, 2022 at 10:13 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When fetching references from a remote we by default also fetch all tags
> > which point into the history we have fetched. This is a separate step
> > performed after updating local references because it requires us to walk
> > over the history on the client-side to determine whether the remote has
> > announced any tags which point to one of the fetched commits.
> >
> > This backfilling of tags isn't covered by the `--atomic` flag: right
> > now, it only applies to the step where we update our local references.
> > This is an oversight at the time the flag was introduced: its purpose is
> > to either update all references or none, but right now we happily update
> > local references even in the case where backfilling failed.
> 
> Also it looks like the backfilling of tags itself isn't atomic, right?
> Some tags could be backfilled while others aren't.

Right.

> > Fix this by pulling up creation of the reference transaction such that
> > we can pass the same transaction to both the code which updates local
> > references and to the code which backfills tags. This allows us to only
> > commit the transaction in case both actions succeed.
> 
> Maybe this could be seen as a regression by users who are mostly
> interested in the local references though.

Even though the commit message discern "local references" and
"backfilled tags", ultimately they're the same. Both are references that
end up in your local refdb, so from the point of the user there is no
real difference here. Documentation of the `--atomic` flag only says
that "either all refs are updared, or on error, no refs are updated". I
think that the current behaviour does not fit the description.

> > Note that we also have to start passing the transaction into
> > `find_non_local_tags()`: this function is responsible for finding all
> > tags which we need to backfill. Right now, it will happily return tags
> > which we have already been updated with our local references. But when
> 
> s/we have/have/
> 
> > we use a single transaction for both local references and backfilling
> > then it may happen that we try to queue the same reference update twice
> > to the transaction, which consequentially triggers a bug. We thus have
> 
> s/consequentially/consequently/
> 
> > to skip over any tags which have already been queued. Unfortunately,
> > this requires us to reach into internals of the reference transaction to
> > access queued updates, but there is no non-internal interface right now
> > which would allow us to access this information.
> 
> This makes me wonder if such a non-internal interface should be
> implemented first. Or if some function to queue a reference update
> could check if the same reference update has already been queued.

Yeah. I noted that ommission in the cover letter already, but didn't yet
want to fix that before getting some initial feedback. I'll add
something like a `for_each_queued_reference_update()` in v2 of this
series though.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags
  2022-02-17  1:34   ` Elijah Newren
@ 2022-02-17 11:58     ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 11:58 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 6319 bytes --]

On Wed, Feb 16, 2022 at 05:34:23PM -0800, Elijah Newren wrote:
> On Mon, Feb 14, 2022 at 1:32 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When fetching references from a remote we by default also fetch all tags
> > which point into the history we have fetched. This is a separate step
> > performed after updating local references because it requires us to walk
> > over the history on the client-side to determine whether the remote has
> > announced any tags which point to one of the fetched commits.
> >
> > This backfilling of tags isn't covered by the `--atomic` flag: right
> > now, it only applies to the step where we update our local references.
> > This is an oversight at the time the flag was introduced: its purpose is
> > to either update all references or none, but right now we happily update
> > local references even in the case where backfilling failed.
> >
> > Fix this by pulling up creation of the reference transaction such that
> > we can pass the same transaction to both the code which updates local
> > references and to the code which backfills tags. This allows us to only
> > commit the transaction in case both actions succeed.
> >
> > Note that we also have to start passing the transaction into
> > `find_non_local_tags()`: this function is responsible for finding all
> > tags which we need to backfill. Right now, it will happily return tags
> > which we have already been updated with our local references. But when
> > we use a single transaction for both local references and backfilling
> > then it may happen that we try to queue the same reference update twice
> > to the transaction, which consequentially triggers a bug. We thus have
> > to skip over any tags which have already been queued. Unfortunately,
> > this requires us to reach into internals of the reference transaction to
> > access queued updates, but there is no non-internal interface right now
> > which would allow us to access this information.
> 
> I like the changes you are making here in general, but I do agree that
> reaching into refs-internal feels a bit icky.  I'm not familiar with
> the refs API nor the fetching code, so feel free to ignore these
> ideas, but I'm just throwing them out there as possibilities to avoid
> reaching into refs-internal:
> 
>   - you are trying to check for existing transactions to avoid
> duplicates, but those existing transactions came from elsewhere in the
> same code we control.  Could we store a strset or strmap of the items
> being updated (in addition to storing them in the transaction), and
> then use the strset/strmap to filter out which tags we need to
> backfill?  Or would that require plumbing an extra variable through an
> awful lot of callers to get the information into the right places?

We basically would need to plumb through the variable to most callsites
which also get the transaction as input, and that's rather deep into the
callstack. The reason I think it's preferable to instead use the
transaction is that it holds the definitive state of all updates we have
already queued, and thus we cannot accidentally forget to update another
auxiliary variable.

>   - would it make sense to add a flag to the transaction API to allow
> duplicates if both updates update the ref to the same value?  (I'm
> guessing you're updating to the same value, right?)

It should be the same value, yes. There is a race though in the context
of tag backfilling: if the initial fetch pulls in some tags, then it can
happen that second fetch used in some cases for the backfilling
mechanism pulls in the same tag references but with different target
objects. It's an unlikely thing to happen, but cannot be ruled out a
100%. As Jonathan pointed out the backfilling-fetch is only used when
the transport does not use "include-tag" though.

The result in that case would be that the transaction aborts because of
duplicate addition of the same ref with different values. And I'd say
that this is correct behaviour in case the user asked for an atomic
fetch.

>   - should we just add something to the refs API along the lines of
> "transaction_includes_update_for()" or something like that?

I think something in the spirit of this last option would be the easiest
solution. Using `includes_updates_for()` or the above solution of a flag
which avoids duplicate updates would potentially be quadratic in
behaviour though if implemented naively: we need to walk all queued
updates for each of the tags we want to queue. That's easy enough to
avoid if we just add a `for_each_queued_reference_update()` and then
continue to do the same thing like we below. It also gives us greater
flexibility compared to the other alternatives.

> [...]
> > @@ -361,12 +362,28 @@ static void find_non_local_tags(const struct ref *refs,
> >         const struct ref *ref;
> >         struct refname_hash_entry *item = NULL;
> >         const int quick_flags = OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT;
> > +       int i;
> >
> >         refname_hash_init(&existing_refs);
> >         refname_hash_init(&remote_refs);
> >         create_fetch_oidset(head, &fetch_oids);
> >
> >         for_each_ref(add_one_refname, &existing_refs);
> > +
> > +       /*
> > +        * If we already have a transaction, then we need to filter out all
> > +        * tags which have already been queued up.
> > +        */
> > +       for (i = 0; transaction && i < transaction->nr; i++) {
> > +               if (!starts_with(transaction->updates[i]->refname, "refs/tags/") ||
> > +                   !(transaction->updates[i]->flags & REF_HAVE_NEW))
> > +                       continue;
> > +               (void) refname_hash_add(&existing_refs,
> > +                                       transaction->updates[i]->refname,
> > +                                       &transaction->updates[i]->new_oid);
> 
> Why the typecast here?

`refname_hash_add()` returns the newly added entry, and we don't care
about that. `add_one_refname()` has the same cast, potentially to
demonstrate that we don't need the return value? Compilers shouldn't
care I think, but on the other hand some static analysis sites like
Coverity use to complain about such things.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs
  2022-02-15  9:12   ` Christian Couder
@ 2022-02-17 12:03     ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 12:03 UTC (permalink / raw)
  To: Christian Couder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3333 bytes --]

On Tue, Feb 15, 2022 at 10:12:23AM +0100, Christian Couder wrote:
> On Fri, Feb 11, 2022 at 9:25 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When fetching with the `--prune` flag we will delete any local
> > references matching the fetch refspec which have disappeared on the
> > remote. This step is not currently covered by the `--atomic` flag: we
> > delete branches even though updating of local references has failed,
> > which means that the fetch is not an all-or-nothing operation.
> 
> It could perhaps be seen as a regression by some users though, if
> updating of local references doesn't work anymore when deleting a
> local reference matching the fetch refspec fails.

I guess the same comment applies here as for the other patch: the
documentation says that we either update all or no refs, and that's not
what was happening previous to this patch.

> > Fix this bug by passing in the global transaction into `prune_refs()`:
> > if one is given, then we'll only queue up deletions and not commit them
> > right away.
> >
> > This change also improves performance when pruning many branches in a
> > repository with a big packed-refs file: every references is pruned in
> > its own transaction, which means that we potentially have to rewrite
> > the packed-refs files for every single reference we're about to prune.
> 
> Yeah, I wonder if there could be a performance improvement in the
> previous patch too as it looks like tag backfilling wasn't atomic too.

I doubt it would be as measurable as it is here. The reason why we have
this speedup is that for every deleted ref, we need to rewrite the
complete contents of the packed-refs file only with that single ref
removed from it. So for 10k refs, we essentially write the file 10k
times.

For the backfilling case that doesn't happen though: we only write the
new loose refs, and that's not any faster in case we use a single
transaction. Sure, we'll likely be able to shed some of the overhead by
using a single transaction only, but it will not be as pronounced as it
is here.

This will be different though as soon as the reftable backend lands:
there we'd write all new refs in a single slice, and that's definitely
more efficient than writing one slice per backfilled ref.

Patrick

> > The following benchmark demonstrates this: it performs a pruning fetch
> > from a repository with a single reference into a repository with 100k
> > references, which causes us to prune all but one reference. This is of
> > course a very artificial setup, but serves to demonstrate the impact of
> > only having to write the packed-refs file once:
> >
> >     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
> >       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
> >       Range (min … max):    2.328 s …  2.407 s    10 runs
> >
> >     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
> >       Range (min … max):    1.346 s …  1.400 s    10 runs
> >
> >     Summary
> >       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
> >         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'
> 
> Nice!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs
  2022-02-17  1:40   ` Elijah Newren
@ 2022-02-17 12:06     ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 12:06 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 3391 bytes --]

On Wed, Feb 16, 2022 at 05:40:19PM -0800, Elijah Newren wrote:
> On Fri, Feb 11, 2022 at 12:25 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > When fetching with the `--prune` flag we will delete any local
> > references matching the fetch refspec which have disappeared on the
> > remote. This step is not currently covered by the `--atomic` flag: we
> > delete branches even though updating of local references has failed,
> > which means that the fetch is not an all-or-nothing operation.
> >
> > Fix this bug by passing in the global transaction into `prune_refs()`:
> > if one is given, then we'll only queue up deletions and not commit them
> > right away.
> >
> > This change also improves performance when pruning many branches in a
> > repository with a big packed-refs file: every references is pruned in
> > its own transaction, which means that we potentially have to rewrite
> > the packed-refs files for every single reference we're about to prune.
> >
> > The following benchmark demonstrates this: it performs a pruning fetch
> > from a repository with a single reference into a repository with 100k
> > references, which causes us to prune all but one reference. This is of
> > course a very artificial setup, but serves to demonstrate the impact of
> > only having to write the packed-refs file once:
> >
> >     Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
> >       Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
> >       Range (min … max):    2.328 s …  2.407 s    10 runs
> >
> >     Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
> >       Range (min … max):    1.346 s …  1.400 s    10 runs
> >
> >     Summary
> >       'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
> >         1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'
> 
> Very nice!
> 
> [...]
> > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> > index 93a0db3c68..afa6bf9f7d 100755
> > --- a/t/t5510-fetch.sh
> > +++ b/t/t5510-fetch.sh
> > @@ -349,11 +349,9 @@ test_expect_success 'fetch --atomic --prune executes a single reference transact
> >         cat >expected <<-EOF &&
> >                 prepared
> >                 $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> > -               committed
> > -               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> > -               prepared
> >                 $ZERO_OID $head_oid refs/remotes/origin/new-branch
> >                 committed
> 
> Up to here this is just what I expected.
> 
> > +               $ZERO_OID $ZERO_OID refs/remotes/origin/scheduled-for-deletion
> >                 $ZERO_OID $head_oid refs/remotes/origin/new-branch
> 
> But now scheduled-for-deletion and new-branch are both listed again
> even with your fixes?  Is this some peculiarity of the reference
> transaction hook that it lists the refs again after the
> prepared...committed block?  (It may well be; I'm not that familiar
> with this area of the code.)

Yes, the reference-transaction hook is executed for each of the
"prepared", "committed" and "aborted" phases and gets as input all the
refs that have been queued for that phase.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] fetch: report errors when backfilling tags fails
  2022-02-17 11:27     ` Patrick Steinhardt
@ 2022-02-17 12:47       ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-02-17 12:47 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Thomas Gummerer

[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]

On Thu, Feb 17, 2022 at 12:27:15PM +0100, Patrick Steinhardt wrote:
> On Tue, Feb 15, 2022 at 08:52:14AM +0100, Christian Couder wrote:
> > On Fri, Feb 11, 2022 at 9:03 PM Patrick Steinhardt <ps@pks.im> wrote:
> > >
> > > When the backfilling of tags fails we do not report this error to the
> > > caller, but only report it implicitly at a later point when reporting
> > > updated references.
> > 
> > Probably stupid question: are we sure that it's a bug and not a feature?
> 
> Good question, and I don't have a definitive answer for it. But to me it
> very much smells like a bug: if I ask for a fetch and the fetch fails to
> populate some of the data I have asked for, then I want to get a
> notification on that failure.
> 
> > > This leaves callers unable to act upon the
> > > information of whether the backfilling succeeded or not.
> > >
> > > Refactor the function to return an error code and pass it up the
> > > callstack. This causes us to correctly propagate the error back to the
> > > user of git-fetch(1).
> > 
> > Even if it would have been the right behavior when backfilling tags
> > was implemented to return an error when backfilling tags fails, I
> > think it's interesting to ask ourselves if this change could be seen
> > as a regression by some users.
> 
> Yeah, it's not all that clear-cut because auto-following of tags is a
> bit obscure. But our default behaviour is to fetch tags pointing into
> the history, and if a user didn't want that they should've passed
> `--no-tags` to git-fetch(1). So conversely, we should assume that a user
> is asking for auto-filling of tags if we're not told otherwise, which
> also means that it is a failure if this fails.
> 
> At least that's my take, but I'm happy to hear arguments against my
> viewpoint.
> 
> Patrick

I just noticed that we have in fact landed a change in the exact same
spirit on `main` via c9e04d905e (fetch --prune: exit with error if
pruning fails, 2022-01-31). So there is precedent that we fix up these
missing error codes, and that gives me more confidence that doing the
same fixup for the tag-backfill is the correct thing to do.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-02-17 12:48 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11  7:46 [PATCH 0/6] fetch: improve atomicity of `--atomic` flag Patrick Steinhardt
2022-02-11  7:46 ` [PATCH 1/6] fetch: increase test coverage of fetches Patrick Steinhardt
2022-02-15  6:19   ` Christian Couder
2022-02-17 11:13     ` Patrick Steinhardt
2022-02-11  7:46 ` [PATCH 2/6] fetch: backfill tags before setting upstream Patrick Steinhardt
2022-02-15  6:43   ` Christian Couder
2022-02-11  7:46 ` [PATCH 3/6] fetch: control lifecycle of FETCH_HEAD in a single place Patrick Steinhardt
2022-02-15  7:32   ` Christian Couder
2022-02-17 11:18     ` Patrick Steinhardt
2022-02-11  7:46 ` [PATCH 4/6] fetch: report errors when backfilling tags fails Patrick Steinhardt
2022-02-15  7:52   ` Christian Couder
2022-02-17 11:27     ` Patrick Steinhardt
2022-02-17 12:47       ` Patrick Steinhardt
2022-02-16 23:35   ` Jonathan Tan
2022-02-17  1:31   ` Elijah Newren
2022-02-11  7:47 ` [PATCH 5/6] fetch: make `--atomic` flag cover backfilling of tags Patrick Steinhardt
2022-02-15  8:11   ` Christian Couder
2022-02-16 23:41     ` Jonathan Tan
2022-02-17 11:37     ` Patrick Steinhardt
2022-02-17  1:34   ` Elijah Newren
2022-02-17 11:58     ` Patrick Steinhardt
2022-02-11  7:47 ` [PATCH 6/6] fetch: make `--atomic` flag cover pruning of refs Patrick Steinhardt
2022-02-15  9:12   ` Christian Couder
2022-02-17 12:03     ` Patrick Steinhardt
2022-02-16 23:39   ` Jonathan Tan
2022-02-17  1:40   ` Elijah Newren
2022-02-17 12:06     ` Patrick Steinhardt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).