git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH 0/2] fetch: implement support for atomic reference updates
@ 2021-01-07 13:51 Patrick Steinhardt
  2021-01-07 13:51 ` [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
                   ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-07 13:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 688 bytes --]

Hi,

this is a short patch series to implement support for atomic reference
updates for git-fetch(1). It's similar to `git push --atomic`, only that
it applies to the local side. That is the fetch will either succeed and
update all remote references or it will fail and update none.

Patrick

Patrick Steinhardt (2):
  fetch: allow passing a transaction to `s_update_ref()`
  fetch: implement support for atomic reference updates

 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 |  72 ++++++++++++-----
 t/t5510-fetch.sh                | 139 ++++++++++++++++++++++++++++++++
 3 files changed, 197 insertions(+), 18 deletions(-)

-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
@ 2021-01-07 13:51 ` Patrick Steinhardt
  2021-01-07 22:59   ` Junio C Hamano
  2021-01-07 13:51 ` [PATCH 2/2] fetch: implement support for atomic reference updates Patrick Steinhardt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-07 13:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5670 bytes --]

The handling of ref updates is completely handled by `s_update_ref()`,
which will manage the complete lifecycle of the reference transaction.
This is fine right now given that git-fetch(1) does not support atomic
fetches, so each reference gets its own transaction. It is quite
inflexible though, as `s_update_ref()` only knows about a single
reference update at a time, so it doesn't allow us to alter the
strategy.

This commit prepares `s_update_ref()` and its only caller
`update_local_ref()` to allow passing an external transaction. If none
is given, then the existing behaviour is triggered which creates a new
transaction and directly commits it. Otherwise, if the caller provides a
transaction, then we only queue the update but don't commit it. This
optionally allows the caller to manage when a transaction will be
committed.

Given that `update_local_ref()` is always called with a `NULL`
transaction for now, no change in behaviour is expected from this
change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 48 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ecf8537605..020a977bc7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -583,13 +583,14 @@ static struct ref *get_ref_map(struct remote *remote,
 
 static int s_update_ref(const char *action,
 			struct ref *ref,
+			struct ref_transaction *transaction,
 			int check_old)
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *transaction;
+	struct ref_transaction *transaction_to_free = NULL;
 	struct strbuf err = STRBUF_INIT;
-	int ret, df_conflict = 0;
+	int ret, df_conflict = 0, commit = 0;
 
 	if (dry_run)
 		return 0;
@@ -597,26 +598,38 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	transaction = ref_transaction_begin(&err);
-	if (!transaction ||
-	    ref_transaction_update(transaction, ref->name,
+	/*
+	 * If no transaction was passed to us, we manage the transaction
+	 * ourselves. Otherwise, we trust the caller to handle the transaction
+	 * lifecycle.
+	 */
+	if (!transaction) {
+		transaction = transaction_to_free = ref_transaction_begin(&err);
+		if (!transaction)
+			goto fail;
+		commit = 1;
+	}
+
+	if (ref_transaction_update(transaction, ref->name,
 				   &ref->new_oid,
 				   check_old ? &ref->old_oid : NULL,
 				   0, msg, &err))
 		goto fail;
 
-	ret = ref_transaction_commit(transaction, &err);
-	if (ret) {
-		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
-		goto fail;
+	if (commit) {
+		ret = ref_transaction_commit(transaction, &err);
+		if (ret) {
+			df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
+			goto fail;
+		}
 	}
 
-	ref_transaction_free(transaction);
+	ref_transaction_free(transaction_to_free);
 	strbuf_release(&err);
 	free(msg);
 	return 0;
 fail:
-	ref_transaction_free(transaction);
+	ref_transaction_free(transaction_to_free);
 	error("%s", err.buf);
 	strbuf_release(&err);
 	free(msg);
@@ -759,6 +772,7 @@ static void format_display(struct strbuf *display, char code,
 }
 
 static int update_local_ref(struct ref *ref,
+			    struct ref_transaction *transaction,
 			    const char *remote,
 			    const struct ref *remote_ref,
 			    struct strbuf *display,
@@ -799,7 +813,7 @@ static int update_local_ref(struct ref *ref,
 	    starts_with(ref->name, "refs/tags/")) {
 		if (force || ref->force) {
 			int r;
-			r = s_update_ref("updating tag", ref, 0);
+			r = s_update_ref("updating tag", ref, transaction, 0);
 			format_display(display, r ? '!' : 't', _("[tag update]"),
 				       r ? _("unable to update local ref") : NULL,
 				       remote, pretty_ref, summary_width);
@@ -836,7 +850,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		r = s_update_ref(msg, ref, 0);
+		r = s_update_ref(msg, ref, transaction, 0);
 		format_display(display, r ? '!' : '*', what,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -858,7 +872,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("fast-forward", ref, 1);
+		r = s_update_ref("fast-forward", ref, transaction, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -870,7 +884,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("forced-update", ref, 1);
+		r = s_update_ref("forced-update", ref, transaction, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
 			       r ? _("unable to update local ref") : _("forced update"),
 			       remote, pretty_ref, summary_width);
@@ -1034,8 +1048,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, what, rm, &note,
-						       summary_width);
+				rc |= update_local_ref(ref, NULL, what,
+						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
 				/*
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/2] fetch: implement support for atomic reference updates
  2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
  2021-01-07 13:51 ` [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
@ 2021-01-07 13:51 ` Patrick Steinhardt
  2021-01-08  0:19   ` Junio C Hamano
  2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-07 13:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 10632 bytes --]

When executing a fetch, then git will currently allocate one reference
transaction per reference update and directly commit it. This means that
fetches are non-atomic: even if some of the reference updates fail,
others may still succeed and modify local references.

This is fine in many scenarios, but this strategy has its downsides.

- The view of remote references may be inconsistent and may show a
  bastardized state of the remote repository.

- Batching together updates may improve performance in certain
  scenarios. While the impact probably isn't as pronounced with loose
  references, the upcoming reftable backend may benefit as it needs to
  write less files in case the update is batched.

- The reference-update hook is currently being executed twice per
  updated reference. While this doesn't matter when there is no such
  hook, we have seen severe performance regressions when doing a
  git-fetch(1) with reference-transaction hook when the remote
  repository has hundreds of thousands of references.

Similar to `git push --atomic`, this commit thus introduces atomic
fetches. Instead of allocating one reference transaction per updated
reference, it causes us to only allocate a single transaction and commit
it as soon as all updates were received. If locking of any reference
fails, then we abort the complete transaction and don't update any
reference, which gives us an all-or-nothing fetch.

Note that this may not completely fix the first of above downsides, as
the consistent view also depends on the server-side. If the server
doesn't have a consistent view of its own references during the
reference negotiation phase, then the client would get the same
inconsistent view the server has. This is a separate problem though and,
if it actually exists, can be fixed at a later point.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 |  26 +++++-
 t/t5510-fetch.sh                | 139 ++++++++++++++++++++++++++++++++
 3 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 2bf77b46fd..07783deee3 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -7,6 +7,10 @@
 	existing contents of `.git/FETCH_HEAD`.  Without this
 	option old data in `.git/FETCH_HEAD` will be overwritten.
 
+--atomic::
+	Use an atomic transaction to update local refs. Either all refs are
+	updated, or on error, no refs are updated.
+
 --depth=<depth>::
 	Limit fetching to the specified number of commits from the tip of
 	each remote branch history. If fetching to a 'shallow' repository
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 020a977bc7..5675ae4ec3 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -63,6 +63,7 @@ static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
 static int max_jobs = -1, submodule_fetch_jobs_config = -1;
 static int fetch_parallel_config = 1;
+static int atomic_fetch;
 static enum transport_family family;
 static const char *depth;
 static const char *deepen_since;
@@ -144,6 +145,8 @@ static struct option builtin_fetch_options[] = {
 		 N_("set upstream for git pull/fetch")),
 	OPT_BOOL('a', "append", &append,
 		 N_("append to .git/FETCH_HEAD instead of overwriting")),
+	OPT_BOOL(0, "atomic", &atomic_fetch,
+		 N_("use atomic transaction to update references")),
 	OPT_STRING(0, "upload-pack", &upload_pack, N_("path"),
 		   N_("path to upload pack on remote end")),
 	OPT__FORCE(&force, N_("force overwrite of local reference"), 0),
@@ -926,7 +929,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	FILE *fp;
 	struct commit *commit;
 	int url_len, i, rc = 0;
-	struct strbuf note = STRBUF_INIT;
+	struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+	struct ref_transaction *transaction = NULL;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
@@ -955,6 +959,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (atomic_fetch) {
+		transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	prepare_format_display(ref_map);
 
 	/*
@@ -1048,7 +1060,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, NULL, what,
+				rc |= update_local_ref(ref, transaction, what,
 						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
@@ -1074,6 +1086,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc && transaction) {
+		rc = ref_transaction_commit(transaction, &err);
+		if (rc) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
@@ -1090,6 +1110,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
  abort:
 	strbuf_release(&note);
+	strbuf_release(&err);
+	ref_transaction_free(transaction);
 	free(url);
 	fclose(fp);
 	return rc;
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2013051a64..57da3ab2b3 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -176,6 +176,145 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
 	git rev-parse sometag
 '
 
+test_expect_success 'fetch --atomic works with a single branch' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch &&
+	git rev-parse atomic-branch >expected &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse origin/atomic-branch >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic works with multiple branches' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch-1 &&
+	git branch atomic-branch-2 &&
+	git branch atomic-branch-3 &&
+	git rev-parse refs/heads/atomic-branch-1 refs/heads/atomic-branch-2 refs/heads/atomic-branch-3 >actual &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-branch-1 refs/remotes/origin/atomic-branch-2 refs/remotes/origin/atomic-branch-3 >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic works with mixed branches and tags' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-mixed-branch &&
+	git tag atomic-mixed-tag &&
+	git rev-parse refs/heads/atomic-mixed-branch refs/tags/atomic-mixed-tag >actual &&
+
+	git -C atomic fetch --tags --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-mixed-branch refs/tags/atomic-mixed-tag >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic prunes references' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-prune-delete &&
+	git clone . atomic &&
+	git branch --delete atomic-prune-delete &&
+	git branch atomic-prune-create &&
+	git rev-parse refs/heads/atomic-prune-create >actual &&
+
+	git -C atomic fetch --prune --atomic origin &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-prune-delete &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-prune-create >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic aborts with non-fast-forward update' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-non-ff &&
+	git clone . atomic &&
+	git rev-parse HEAD >actual &&
+
+	git branch atomic-new-branch &&
+	parent_commit=$(git rev-parse atomic-non-ff~) &&
+	git update-ref refs/heads/atomic-non-ff $parent_commit &&
+
+	test_must_fail git -C atomic fetch --atomic origin refs/heads/*:refs/remotes/origin/* &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-new-branch &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-non-ff >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic executes a single reference transaction only' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-1 &&
+	git branch atomic-hooks-2 &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+		committed
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+	EOF
+
+	git -C atomic fetch --atomic origin &&
+	test_cmp expected atomic/actual
+'
+
+test_expect_success 'fetch --atomic aborts all reference updates if hook aborts' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-abort-1 &&
+	git branch atomic-hooks-abort-2 &&
+	git branch atomic-hooks-abort-3 &&
+	git tag atomic-hooks-abort &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+		aborted
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+		exit 1
+	EOF
+
+	git -C atomic for-each-ref >expected-refs &&
+	test_must_fail git -C atomic fetch --tags --atomic origin &&
+	git -C atomic for-each-ref >actual-refs &&
+	test_cmp expected-refs actual-refs
+'
+
 test_expect_success '--refmap="" ignores configured refspec' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone "$D" remote-refs &&
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-07 13:51 ` [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
@ 2021-01-07 22:59   ` Junio C Hamano
  2021-01-08  0:45     ` Christian Couder
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2021-01-07 22:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

>  static int s_update_ref(const char *action,
>  			struct ref *ref,
> +			struct ref_transaction *transaction,
>  			int check_old)
>  {
>  	char *msg;
>  	char *rla = getenv("GIT_REFLOG_ACTION");
> -	struct ref_transaction *transaction;
> +	struct ref_transaction *transaction_to_free = NULL;
>  	struct strbuf err = STRBUF_INIT;
> -	int ret, df_conflict = 0;
> +	int ret, df_conflict = 0, commit = 0;

> @@ -597,26 +598,38 @@ static int s_update_ref(const char *action,
>  		rla = default_rla.buf;
>  	msg = xstrfmt("%s: %s", rla, action);
>  
> -	transaction = ref_transaction_begin(&err);
> -	if (!transaction ||
> -	    ref_transaction_update(transaction, ref->name,
> +	/*
> +	 * If no transaction was passed to us, we manage the transaction
> +	 * ourselves. Otherwise, we trust the caller to handle the transaction
> +	 * lifecycle.
> +	 */
> +	if (!transaction) {
> +		transaction = transaction_to_free = ref_transaction_begin(&err);
> +		if (!transaction)
> +			goto fail;
> +		commit = 1;
> +	}

The idea is that we still allow the caller to pass NULL in which
case we begin and commit a transaction ourselves, but if the caller
is to pass an already initialized transaction to us, we obviously
do not have to "begin" but also we won't commit.

Which makes sense, but then do we still need the "commit" variable
that reminds us "we are responsible for finishing the transaction we
started"?

If we rename "transaction_to_free" to a name that makes it more
clear that it is a transaction that we created and that we are
responsible for seeing through to its end (after all, "to-free"
captures only one half of that, i.e. before returning, we are
responsible for calling ref_transation_free() on it), then we do not
need such an extra variable that can go out of sync by mistake, no?
The block that protects the call to ref_transaction_commit() can
just check if the transaction_to_free variable (with a better name)
is non-NULL, instead of looking at the commit variable.

Just my attempt to come up with an alternative name for
transaction_to_free:

 - "our_transaction" because it is ours?

 - "auto_transaction" because it is automatically started and
   committed without bothering the caller?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] fetch: implement support for atomic reference updates
  2021-01-07 13:51 ` [PATCH 2/2] fetch: implement support for atomic reference updates Patrick Steinhardt
@ 2021-01-08  0:19   ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-08  0:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> Similar to `git push --atomic`, this commit thus introduces atomic
> fetches. Instead of allocating one reference transaction per updated
> reference, it causes us to only allocate a single transaction and commit
> it as soon as all updates were received. If locking of any reference
> fails, then we abort the complete transaction and don't update any
> reference, which gives us an all-or-nothing fetch.
>
> Note that this may not completely fix the first of above downsides, as
> the consistent view also depends on the server-side. If the server
> doesn't have a consistent view of its own references during the
> reference negotiation phase, then the client would get the same
> inconsistent view the server has. This is a separate problem though and,
> if it actually exists, can be fixed at a later point.

With the help of the previous step, it is trivial to see that the
single transaction around store_updated_refs() would be sufficient
to achieve the atomicity.

But what about FETCH_HEAD?  Do we refrain from writing and/or appending
to it when there is any failure?

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-07 22:59   ` Junio C Hamano
@ 2021-01-08  0:45     ` Christian Couder
  2021-01-08  7:18       ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Christian Couder @ 2021-01-08  0:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git

On Fri, Jan 8, 2021 at 12:04 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Patrick Steinhardt <ps@pks.im> writes:
>
> >  static int s_update_ref(const char *action,
> >                       struct ref *ref,
> > +                     struct ref_transaction *transaction,
> >                       int check_old)
> >  {
> >       char *msg;
> >       char *rla = getenv("GIT_REFLOG_ACTION");
> > -     struct ref_transaction *transaction;
> > +     struct ref_transaction *transaction_to_free = NULL;
> >       struct strbuf err = STRBUF_INIT;
> > -     int ret, df_conflict = 0;
> > +     int ret, df_conflict = 0, commit = 0;
>
> > @@ -597,26 +598,38 @@ static int s_update_ref(const char *action,
> >               rla = default_rla.buf;
> >       msg = xstrfmt("%s: %s", rla, action);
> >
> > -     transaction = ref_transaction_begin(&err);
> > -     if (!transaction ||
> > -         ref_transaction_update(transaction, ref->name,
> > +     /*
> > +      * If no transaction was passed to us, we manage the transaction
> > +      * ourselves. Otherwise, we trust the caller to handle the transaction
> > +      * lifecycle.
> > +      */
> > +     if (!transaction) {
> > +             transaction = transaction_to_free = ref_transaction_begin(&err);
> > +             if (!transaction)
> > +                     goto fail;
> > +             commit = 1;
> > +     }
>
> The idea is that we still allow the caller to pass NULL in which
> case we begin and commit a transaction ourselves, but if the caller
> is to pass an already initialized transaction to us, we obviously
> do not have to "begin" but also we won't commit.
>
> Which makes sense, but then do we still need the "commit" variable
> that reminds us "we are responsible for finishing the transaction we
> started"?

Yeah, I think we also don't need the df_conflict variable, and I don't
like the duplication of the following clean up code:

        ref_transaction_free(transaction_to_free);
        strbuf_release(&err);
        free(msg);

Patrick's patch didn't introduce them, but we might still want to get
rid of them (see below).

> If we rename "transaction_to_free" to a name that makes it more
> clear that it is a transaction that we created and that we are
> responsible for seeing through to its end (after all, "to-free"
> captures only one half of that, i.e. before returning, we are
> responsible for calling ref_transation_free() on it), then we do not
> need such an extra variable that can go out of sync by mistake, no?
> The block that protects the call to ref_transaction_commit() can
> just check if the transaction_to_free variable (with a better name)
> is non-NULL, instead of looking at the commit variable.
>
> Just my attempt to come up with an alternative name for
> transaction_to_free:
>
>  - "our_transaction" because it is ours?
>
>  - "auto_transaction" because it is automatically started and
>    committed without bothering the caller?

"our_transaction" looks fine.

Here is a suggested refactoring patch that could come before Patrick's patch:

-------------------------------------------------------------------------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index ecf8537605..8017fc19da 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -581,6 +581,22 @@ static struct ref *get_ref_map(struct remote *remote,
#define STORE_REF_ERROR_OTHER 1
#define STORE_REF_ERROR_DF_CONFLICT 2

+static int do_s_update_ref(struct ref_transaction *transaction,
+                          struct ref *ref,
+                          int check_old,
+                          struct strbuf *err,
+                          char *msg)
+{
+       if (!transaction ||
+           ref_transaction_update(transaction, ref->name,
+                                  &ref->new_oid,
+                                  check_old ? &ref->old_oid : NULL,
+                                  0, msg, err))
+               return TRANSACTION_GENERIC_ERROR;
+
+       return ref_transaction_commit(transaction, err);
+}
+
static int s_update_ref(const char *action,
                       struct ref *ref,
                       int check_old)
@@ -589,7 +605,7 @@ static int s_update_ref(const char *action,
       char *rla = getenv("GIT_REFLOG_ACTION");
       struct ref_transaction *transaction;
       struct strbuf err = STRBUF_INIT;
-       int ret, df_conflict = 0;
+       int ret = 0;

       if (dry_run)
               return 0;
@@ -598,30 +614,19 @@ static int s_update_ref(const char *action,
       msg = xstrfmt("%s: %s", rla, action);

       transaction = ref_transaction_begin(&err);
-       if (!transaction ||
-           ref_transaction_update(transaction, ref->name,
-                                  &ref->new_oid,
-                                  check_old ? &ref->old_oid : NULL,
-                                  0, msg, &err))
-               goto fail;

-       ret = ref_transaction_commit(transaction, &err);
+       ret = do_s_update_ref(transaction, ref, check_old, &err, msg);
       if (ret) {
-               df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
-               goto fail;
+               error("%s", err.buf);
+               ret = (ret == TRANSACTION_NAME_CONFLICT)
+                       ? STORE_REF_ERROR_DF_CONFLICT
+                       : STORE_REF_ERROR_OTHER;
       }

       ref_transaction_free(transaction);
       strbuf_release(&err);
       free(msg);
-       return 0;
-fail:
-       ref_transaction_free(transaction);
-       error("%s", err.buf);
-       strbuf_release(&err);
-       free(msg);
-       return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
-                          : STORE_REF_ERROR_OTHER;
+       return ret;
}

static int refcol_width = 10;
-------------------------------------------------------------------------

After the above patch, Patrick's patch would become:

-------------------------------------------------------------------------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 8017fc19da..d58805c03d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -584,6 +584,7 @@ static struct ref *get_ref_map(struct remote *remote,
static int do_s_update_ref(struct ref_transaction *transaction,
                          struct ref *ref,
                          int check_old,
+                          int commit,
                          struct strbuf *err,
                          char *msg)
{
@@ -594,16 +595,17 @@ static int do_s_update_ref(struct
ref_transaction *transaction,
                                  0, msg, err))
               return TRANSACTION_GENERIC_ERROR;

-       return ref_transaction_commit(transaction, err);
+       return (commit) ? ref_transaction_commit(transaction, err) : 0;
}

static int s_update_ref(const char *action,
+                       struct ref_transaction *transaction,
                       struct ref *ref,
                       int check_old)
{
       char *msg;
       char *rla = getenv("GIT_REFLOG_ACTION");
-       struct ref_transaction *transaction;
+       struct ref_transaction *our_transaction = NULL;
       struct strbuf err = STRBUF_INIT;
       int ret = 0;

@@ -613,9 +615,16 @@ static int s_update_ref(const char *action,
               rla = default_rla.buf;
       msg = xstrfmt("%s: %s", rla, action);

-       transaction = ref_transaction_begin(&err);
+       /*
+        * If no transaction was passed to us, we manage the
+        * transaction ourselves. Otherwise, we trust the caller to
+        * handle the transaction lifecycle.
+        */
+       if (!transaction)
+               transaction = our_transaction = ref_transaction_begin(&err);

-       ret = do_s_update_ref(transaction, ref, check_old, &err, msg);
+       ret = do_s_update_ref(transaction, ref, check_old, !!our_transaction,
+                             &err, msg);
       if (ret) {
               error("%s", err.buf);
               ret = (ret == TRANSACTION_NAME_CONFLICT)
@@ -623,7 +632,7 @@ static int s_update_ref(const char *action,
                       : STORE_REF_ERROR_OTHER;
       }

-       ref_transaction_free(transaction);
+       ref_transaction_free(our_transaction);
       strbuf_release(&err);
       free(msg);
       return ret;
...
-------------------------------------------------------------------------

You can find these patches, with Patrick as the author, there:

https://gitlab.com/gitlab-org/gitlab-git/-/commits/cc-improve-s-update-ref/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-08  0:45     ` Christian Couder
@ 2021-01-08  7:18       ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-08  7:18 UTC (permalink / raw)
  To: Christian Couder; +Cc: Patrick Steinhardt, git

Christian Couder <christian.couder@gmail.com> writes:

> Yeah, I think we also don't need the df_conflict variable, and I don't
> like the duplication of the following clean up code:
>
>         ref_transaction_free(transaction_to_free);
>         strbuf_release(&err);
>         free(msg);
>
> Patrick's patch didn't introduce them, but we might still want to get
> rid of them (see below).



> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index ecf8537605..8017fc19da 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -581,6 +581,22 @@ static struct ref *get_ref_map(struct remote *remote,
> #define STORE_REF_ERROR_OTHER 1
> #define STORE_REF_ERROR_DF_CONFLICT 2
>
> +static int do_s_update_ref(struct ref_transaction *transaction,
> ...
> }
>
> static int refcol_width = 10;
> -------------------------------------------------------------------------
>
> After the above patch, Patrick's patch would become:

OK, I think the above as a preliminary clean-up would not hurt, but
the "if the caller gave us NULL as the transaction, we are
responsible for handling that bogosity" bit feels a bit too magical
to my taste.  I do understand that your intention is that it
releaves the caller ...

> @@ -613,9 +615,16 @@ static int s_update_ref(const char *action,
>                rla = default_rla.buf;
>        msg = xstrfmt("%s: %s", rla, action);
>
> -       transaction = ref_transaction_begin(&err);
> +       /*
> +        * If no transaction was passed to us, we manage the
> +        * transaction ourselves. Otherwise, we trust the caller to
> +        * handle the transaction lifecycle.
> +        */
> +       if (!transaction)
> +               transaction = our_transaction = ref_transaction_begin(&err);

... here from having to deal with NULL from ref_transaction_begin(),
but I somehow do not see it as a code structure with a good taste.

I dunno.

Without that "if (!transaction ||" bit, I think your do_s_update_ref()
is a good clean-up, though.

Thanks.

> -       ret = do_s_update_ref(transaction, ref, check_old, &err, msg);
> +       ret = do_s_update_ref(transaction, ref, check_old, !!our_transaction,
> +                             &err, msg);
>        if (ret) {
>                error("%s", err.buf);
>                ret = (ret == TRANSACTION_NAME_CONFLICT)
> @@ -623,7 +632,7 @@ static int s_update_ref(const char *action,
>                        : STORE_REF_ERROR_OTHER;
>        }
>
> -       ref_transaction_free(transaction);
> +       ref_transaction_free(our_transaction);
>        strbuf_release(&err);
>        free(msg);
>        return ret;
> ...
> -------------------------------------------------------------------------
>
> You can find these patches, with Patrick as the author, there:
>
> https://gitlab.com/gitlab-org/gitlab-git/-/commits/cc-improve-s-update-ref/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/4] fetch: implement support for atomic reference updates
  2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
  2021-01-07 13:51 ` [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
  2021-01-07 13:51 ` [PATCH 2/2] fetch: implement support for atomic reference updates Patrick Steinhardt
@ 2021-01-08 12:11 ` Patrick Steinhardt
  2021-01-08 12:11   ` [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
                     ` (3 more replies)
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
  4 siblings, 4 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-08 12:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 13320 bytes --]

Hi,

this is the second version of my patch series to implement support for
atomic reference updates for git-fetch(1). It's similar to `git push
--atomic`, only that it applies to the local side. That is the fetch
will either succeed and update all remote references or it will fail and
update none.

Changes compared to v1:

    - In v1, we still wrote to FETCH_HEAD even if the fetch failed. I've
      fixed this now by pulling out logic to write to FETCH_HEAD in 1/4
      so that we can now easily buffer updates and commit them only if
      the reference transaction succeeds. There's some additional tests
      in 4/4 to test it works as expected.

    - As suggested by Christian, I've unified the exit path in
      `s_update_ref()` in 2/4.

    - I've dropped the `commit` variable and renamed
      `transaction_to_free` to `our_transaction` as suggested by Junio.

Patrick

Patrick Steinhardt (4):
  fetch: extract writing to FETCH_HEAD
  fetch: refactor `s_update_ref` to use common exit path
  fetch: allow passing a transaction to `s_update_ref()`
  fetch: implement support for atomic reference updates

 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 | 198 ++++++++++++++++++++++++--------
 t/t5510-fetch.sh                | 168 +++++++++++++++++++++++++++
 3 files changed, 322 insertions(+), 48 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  d80dbc5a9c fetch: extract writing to FETCH_HEAD
-:  ---------- > 2:  718a8bf5d7 fetch: refactor `s_update_ref` to use common exit path
1:  e627e729e5 ! 3:  4162d10fcb fetch: allow passing a transaction to `s_update_ref()`
    @@ builtin/fetch.c: static struct ref *get_ref_map(struct remote *remote,
      	char *msg;
      	char *rla = getenv("GIT_REFLOG_ACTION");
     -	struct ref_transaction *transaction;
    -+	struct ref_transaction *transaction_to_free = NULL;
    ++	struct ref_transaction *our_transaction = NULL;
      	struct strbuf err = STRBUF_INIT;
    --	int ret, df_conflict = 0;
    -+	int ret, df_conflict = 0, commit = 0;
    + 	int ret;
      
    - 	if (dry_run)
    - 		return 0;
     @@ builtin/fetch.c: static int s_update_ref(const char *action,
      		rla = default_rla.buf;
      	msg = xstrfmt("%s: %s", rla, action);
      
     -	transaction = ref_transaction_begin(&err);
    --	if (!transaction ||
    --	    ref_transaction_update(transaction, ref->name,
     +	/*
     +	 * If no transaction was passed to us, we manage the transaction
     +	 * ourselves. Otherwise, we trust the caller to handle the transaction
     +	 * lifecycle.
     +	 */
    -+	if (!transaction) {
    -+		transaction = transaction_to_free = ref_transaction_begin(&err);
    -+		if (!transaction)
    -+			goto fail;
    -+		commit = 1;
    -+	}
    -+
    -+	if (ref_transaction_update(transaction, ref->name,
    - 				   &ref->new_oid,
    - 				   check_old ? &ref->old_oid : NULL,
    - 				   0, msg, &err))
    - 		goto fail;
    - 
    --	ret = ref_transaction_commit(transaction, &err);
    --	if (ret) {
    --		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
    --		goto fail;
    -+	if (commit) {
    -+		ret = ref_transaction_commit(transaction, &err);
    -+		if (ret) {
    -+			df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
    -+			goto fail;
    + 	if (!transaction) {
    +-		ret = STORE_REF_ERROR_OTHER;
    +-		goto out;
    ++		transaction = our_transaction = ref_transaction_begin(&err);
    ++		if (!transaction) {
    ++			ret = STORE_REF_ERROR_OTHER;
    ++			goto out;
     +		}
      	}
      
    + 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
    +@@ builtin/fetch.c: static int s_update_ref(const char *action,
    + 		goto out;
    + 	}
    + 
    +-	ret = ref_transaction_commit(transaction, &err);
    +-	if (ret) {
    +-		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
    +-							 : STORE_REF_ERROR_OTHER;
    +-		goto out;
    ++	if (our_transaction) {
    ++		ret = ref_transaction_commit(our_transaction, &err);
    ++		if (ret) {
    ++			ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
    ++								 : STORE_REF_ERROR_OTHER;
    ++			goto out;
    ++		}
    + 	}
    + 
    + out:
     -	ref_transaction_free(transaction);
    -+	ref_transaction_free(transaction_to_free);
    ++	ref_transaction_free(our_transaction);
    + 	if (ret)
    + 		error("%s", err.buf);
      	strbuf_release(&err);
    - 	free(msg);
    - 	return 0;
    - fail:
    --	ref_transaction_free(transaction);
    -+	ref_transaction_free(transaction_to_free);
    - 	error("%s", err.buf);
    - 	strbuf_release(&err);
    - 	free(msg);
     @@ builtin/fetch.c: static void format_display(struct strbuf *display, char code,
      }
      
2:  4807344e92 ! 4:  53705281b6 fetch: implement support for atomic reference updates
    @@ Commit message
         inconsistent view the server has. This is a separate problem though and,
         if it actually exists, can be fixed at a later point.
     
    +    This commit also changes the way we write FETCH_HEAD in case `--atomic`
    +    is passed. Instead of writing changes as we go, we need to accumulate
    +    all changes first and only commit them at the end when we know that all
    +    reference updates succeeded. Ideally, we'd just do so via a temporary
    +    file so that we don't need to carry all updates in-memory. This isn't
    +    trivially doable though considering the `--append` mode, where we do not
    +    truncate the file but simply append to it. And given that we support
    +    concurrent processes appending to FETCH_HEAD at the same time without
    +    any loss of data, seeding the temporary file with current contents of
    +    FETCH_HEAD initially and then doing a rename wouldn't work either. So
    +    this commit implements the simple strategy of buffering all changes and
    +    appending them to the file on commit.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## Documentation/fetch-options.txt ##
    @@ builtin/fetch.c: static struct option builtin_fetch_options[] = {
      	OPT_STRING(0, "upload-pack", &upload_pack, N_("path"),
      		   N_("path to upload pack on remote end")),
      	OPT__FORCE(&force, N_("force overwrite of local reference"), 0),
    -@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
    +@@ builtin/fetch.c: static int iterate_ref_map(void *cb_data, struct object_id *oid)
    + 
    + struct fetch_head {
      	FILE *fp;
    ++	struct strbuf buf;
    + };
    + 
    + static int open_fetch_head(struct fetch_head *fetch_head)
    +@@ builtin/fetch.c: static int open_fetch_head(struct fetch_head *fetch_head)
    + 	if (!write_fetch_head)
    + 		return 0;
    + 
    ++	strbuf_init(&fetch_head->buf, 0);
    + 	fetch_head->fp = fopen(filename, "a");
    + 	if (!fetch_head->fp)
    + 		return error_errno(_("cannot open %s"), filename);
    +@@ builtin/fetch.c: static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid
    + 	if (!write_fetch_head)
    + 		return;
    + 
    +-	fprintf(fetch_head->fp, "%s\t%s\t%s",
    +-		old_oid, merge_status_marker, note);
    +-	for (i = 0; i < url_len; ++i)
    +-		if ('\n' == url[i])
    +-			fputs("\\n", fetch_head->fp);
    +-		else
    +-			fputc(url[i], fetch_head->fp);
    +-	fputc('\n', fetch_head->fp);
    ++	/*
    ++	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
    ++	 * any of the reference updates fails. We thus have to write all
    ++	 * updates to a buffer first and only commit it as soon as all
    ++	 * references have been successfully updated.
    ++	 */
    ++	if (atomic_fetch) {
    ++		strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
    ++			    old_oid, merge_status_marker, note);
    ++		strbuf_add(&fetch_head->buf, url, url_len);
    ++		strbuf_addch(&fetch_head->buf, '\n');
    ++	} else {
    ++		fprintf(fetch_head->fp, "%s\t%s\t%s",
    ++			old_oid, merge_status_marker, note);
    ++		for (i = 0; i < url_len; ++i)
    ++			if ('\n' == url[i])
    ++				fputs("\\n", fetch_head->fp);
    ++			else
    ++				fputc(url[i], fetch_head->fp);
    ++		fputc('\n', fetch_head->fp);
    ++	}
    + }
    + 
    + static void commit_fetch_head(struct fetch_head *fetch_head)
    + {
    +-	/* Nothing to commit yet. */
    ++	if (!write_fetch_head || !atomic_fetch)
    ++		return;
    ++	strbuf_write(&fetch_head->buf, fetch_head->fp);
    + }
    + 
    + static void close_fetch_head(struct fetch_head *fetch_head)
    +@@ builtin/fetch.c: static void close_fetch_head(struct fetch_head *fetch_head)
    + 		return;
    + 
    + 	fclose(fetch_head->fp);
    ++	strbuf_release(&fetch_head->buf);
    + }
    + 
    + static const char warn_show_forced_updates[] =
    +@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
    + 	struct fetch_head fetch_head;
      	struct commit *commit;
      	int url_len, i, rc = 0;
     -	struct strbuf note = STRBUF_INIT;
    @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *
     +		}
     +	}
     +
    - 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
    - 		error(_("some local refs could not be updated; try running\n"
    - 		      " 'git remote prune %s' to remove any old, conflicting "
    + 	if (!rc)
    + 		commit_fetch_head(&fetch_head);
    + 
     @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
      
       abort:
    @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *
     +	strbuf_release(&err);
     +	ref_transaction_free(transaction);
      	free(url);
    - 	fclose(fp);
    + 	close_fetch_head(&fetch_head);
      	return rc;
    +@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
    + 			die(_("--filter can only be used with the remote "
    + 			      "configured in extensions.partialclone"));
    + 
    ++		if (atomic_fetch)
    ++			die(_("--atomic can only be used when fetching "
    ++			      "from one remote"));
    ++
    + 		if (stdin_refspecs)
    + 			die(_("--stdin can only be used when fetching "
    + 			      "from one remote"));
     
      ## t/t5510-fetch.sh ##
     @@ t/t5510-fetch.sh: test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
    @@ t/t5510-fetch.sh: test_expect_success 'fetch --prune --tags with refspec prunes
     +	cd "$D" &&
     +	git clone . atomic &&
     +	git branch atomic-branch &&
    -+	git rev-parse atomic-branch >expected &&
    ++	oid=$(git rev-parse atomic-branch) &&
    ++	echo "$oid" >expected &&
     +
     +	git -C atomic fetch --atomic origin &&
     +	git -C atomic rev-parse origin/atomic-branch >actual &&
    -+	test_cmp expected actual
    ++	test_cmp expected actual &&
    ++	test $oid = "$(git -C atomic rev-parse --verify FETCH_HEAD)"
     +'
     +
     +test_expect_success 'fetch --atomic works with multiple branches' '
    @@ t/t5510-fetch.sh: test_expect_success 'fetch --prune --tags with refspec prunes
     +	test_must_fail git -C atomic fetch --atomic origin refs/heads/*:refs/remotes/origin/* &&
     +	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-new-branch &&
     +	git -C atomic rev-parse refs/remotes/origin/atomic-non-ff >expected &&
    -+	test_cmp expected actual
    ++	test_cmp expected actual &&
    ++	test_must_be_empty atomic/.git/FETCH_HEAD
     +'
     +
     +test_expect_success 'fetch --atomic executes a single reference transaction only' '
    @@ t/t5510-fetch.sh: test_expect_success 'fetch --prune --tags with refspec prunes
     +	git -C atomic for-each-ref >expected-refs &&
     +	test_must_fail git -C atomic fetch --tags --atomic origin &&
     +	git -C atomic for-each-ref >actual-refs &&
    -+	test_cmp expected-refs actual-refs
    ++	test_cmp expected-refs actual-refs &&
    ++	test_must_be_empty atomic/.git/FETCH_HEAD
    ++'
    ++
    ++test_expect_success 'fetch --atomic --append appends to FETCH_HEAD' '
    ++	test_when_finished "rm -rf \"$D\"/atomic" &&
    ++
    ++	cd "$D" &&
    ++	git clone . atomic &&
    ++	oid=$(git rev-parse HEAD) &&
    ++
    ++	git branch atomic-fetch-head-1 &&
    ++	git -C atomic fetch --atomic origin atomic-fetch-head-1 &&
    ++	test_line_count = 1 atomic/.git/FETCH_HEAD &&
    ++
    ++	git branch atomic-fetch-head-2 &&
    ++	git -C atomic fetch --atomic --append origin atomic-fetch-head-2 &&
    ++	test_line_count = 2 atomic/.git/FETCH_HEAD &&
    ++	cp atomic/.git/FETCH_HEAD expected &&
    ++
    ++	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
    ++		exit 1
    ++	EOF
    ++
    ++	git branch atomic-fetch-head-3 &&
    ++	test_must_fail git -C atomic fetch --atomic --append origin atomic-fetch-head-3 &&
    ++	test_cmp expected atomic/.git/FETCH_HEAD
     +'
     +
      test_expect_success '--refmap="" ignores configured refspec' '
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD
  2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
@ 2021-01-08 12:11   ` Patrick Steinhardt
  2021-01-08 23:40     ` Junio C Hamano
  2021-01-08 12:11   ` [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-08 12:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 5169 bytes --]

When performing a fetch with the default `--write-fetch-head` option, we
write all updated references to FETCH_HEAD while the updates are
performed. Given that updates are not performed atomically, it means
that we we write to FETCH_HEAD even if some or all of the reference
updates fail.

Given that we simply update FETCH_HEAD ad-hoc with each reference, the
logic is completely contained in `store_update_refs` and thus quite hard
to extend. This can already be seen by the way we skip writing to the
FETCH_HEAD: instead of having a conditional which simply skips writing,
we instead open "/dev/null" and needlessly write all updates there.

We are about to extend git-fetch(1) to accept an `--atomic` flag which
will make the fetch an all-or-nothing operation with regards to the
reference updates. This will also require us to make the updates to
FETCH_HEAD an all-or-nothing operation, but as explained doing so is not
easy with the current layout. This commit thus refactors the wa we write
to FETCH_HEAD and pulls out the logic to open, append to, commit and
close the file. While this may seem rather over-the top at first,
pulling out this logic will make it a lot easier to update the code in a
subsequent commit. It also allows us to easily skip writing completely
in case `--no-write-fetch-head` was passed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 80 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 62 insertions(+), 18 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ecf8537605..557ec130db 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -897,6 +897,56 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
 	return 0;
 }
 
+struct fetch_head {
+	FILE *fp;
+};
+
+static int open_fetch_head(struct fetch_head *fetch_head)
+{
+	const char *filename = git_path_fetch_head(the_repository);
+
+	if (!write_fetch_head)
+		return 0;
+
+	fetch_head->fp = fopen(filename, "a");
+	if (!fetch_head->fp)
+		return error_errno(_("cannot open %s"), filename);
+
+	return 0;
+}
+
+static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid,
+			      const char *merge_status_marker, const char *note,
+			      const char *url, size_t url_len)
+{
+	size_t i;
+
+	if (!write_fetch_head)
+		return;
+
+	fprintf(fetch_head->fp, "%s\t%s\t%s",
+		old_oid, merge_status_marker, note);
+	for (i = 0; i < url_len; ++i)
+		if ('\n' == url[i])
+			fputs("\\n", fetch_head->fp);
+		else
+			fputc(url[i], fetch_head->fp);
+	fputc('\n', fetch_head->fp);
+}
+
+static void commit_fetch_head(struct fetch_head *fetch_head)
+{
+	/* Nothing to commit yet. */
+}
+
+static void close_fetch_head(struct fetch_head *fetch_head)
+{
+	if (!write_fetch_head)
+		return;
+
+	fclose(fetch_head->fp);
+}
+
 static const char warn_show_forced_updates[] =
 N_("Fetch normally indicates which branches had a forced update,\n"
    "but that check has been disabled. To re-enable, use '--show-forced-updates'\n"
@@ -909,22 +959,19 @@ N_("It took %.2f seconds to check forced updates. You can use\n"
 static int store_updated_refs(const char *raw_url, const char *remote_name,
 			      int connectivity_checked, struct ref *ref_map)
 {
-	FILE *fp;
+	struct fetch_head fetch_head;
 	struct commit *commit;
 	int url_len, i, rc = 0;
 	struct strbuf note = STRBUF_INIT;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
-	const char *filename = (!write_fetch_head
-				? "/dev/null"
-				: git_path_fetch_head(the_repository));
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
 
-	fp = fopen(filename, "a");
-	if (!fp)
-		return error_errno(_("cannot open %s"), filename);
+	rc = open_fetch_head(&fetch_head);
+	if (rc)
+		return -1;
 
 	if (raw_url)
 		url = transport_anonymize_url(raw_url);
@@ -1016,16 +1063,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				merge_status_marker = "not-for-merge";
 				/* fall-through */
 			case FETCH_HEAD_MERGE:
-				fprintf(fp, "%s\t%s\t%s",
-					oid_to_hex(&rm->old_oid),
-					merge_status_marker,
-					note.buf);
-				for (i = 0; i < url_len; ++i)
-					if ('\n' == url[i])
-						fputs("\\n", fp);
-					else
-						fputc(url[i], fp);
-				fputc('\n', fp);
+				append_fetch_head(&fetch_head,
+						  oid_to_hex(&rm->old_oid),
+						  merge_status_marker,
+						  note.buf, url, url_len);
 				break;
 			default:
 				/* do not write anything to FETCH_HEAD */
@@ -1060,6 +1101,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc)
+		commit_fetch_head(&fetch_head);
+
 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
@@ -1077,7 +1121,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
  abort:
 	strbuf_release(&note);
 	free(url);
-	fclose(fp);
+	close_fetch_head(&fetch_head);
 	return rc;
 }
 
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path
  2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
  2021-01-08 12:11   ` [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
@ 2021-01-08 12:11   ` Patrick Steinhardt
  2021-01-08 23:50     ` Junio C Hamano
  2021-01-08 12:11   ` [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
  2021-01-08 12:11   ` [PATCH v2 4/4] fetch: implement support for atomic reference updates Patrick Steinhardt
  3 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-08 12:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

The cleanup code in `s_update_ref()` is currently duplicated for both
succesful and erroneous exit paths. This commit refactors the function
to have a shared exit path for both cases to remove the duplication.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 557ec130db..5221a9dbed 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -589,7 +589,7 @@ static int s_update_ref(const char *action,
 	char *rla = getenv("GIT_REFLOG_ACTION");
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	int ret, df_conflict = 0;
+	int ret;
 
 	if (dry_run)
 		return 0;
@@ -598,30 +598,33 @@ static int s_update_ref(const char *action,
 	msg = xstrfmt("%s: %s", rla, action);
 
 	transaction = ref_transaction_begin(&err);
-	if (!transaction ||
-	    ref_transaction_update(transaction, ref->name,
-				   &ref->new_oid,
-				   check_old ? &ref->old_oid : NULL,
-				   0, msg, &err))
-		goto fail;
+	if (!transaction) {
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
+	}
+
+	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
+				     check_old ? &ref->old_oid : NULL,
+				     0, msg, &err);
+	if (ret) {
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
+	}
 
 	ret = ref_transaction_commit(transaction, &err);
 	if (ret) {
-		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
-		goto fail;
+		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
+							 : STORE_REF_ERROR_OTHER;
+		goto out;
 	}
 
+out:
 	ref_transaction_free(transaction);
+	if (ret)
+		error("%s", err.buf);
 	strbuf_release(&err);
 	free(msg);
-	return 0;
-fail:
-	ref_transaction_free(transaction);
-	error("%s", err.buf);
-	strbuf_release(&err);
-	free(msg);
-	return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
-			   : STORE_REF_ERROR_OTHER;
+	return ret;
 }
 
 static int refcol_width = 10;
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
  2021-01-08 12:11   ` [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
  2021-01-08 12:11   ` [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
@ 2021-01-08 12:11   ` Patrick Steinhardt
  2021-01-08 23:53     ` Junio C Hamano
  2021-01-08 12:11   ` [PATCH v2 4/4] fetch: implement support for atomic reference updates Patrick Steinhardt
  3 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-08 12:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 5543 bytes --]

The handling of ref updates is completely handled by `s_update_ref()`,
which will manage the complete lifecycle of the reference transaction.
This is fine right now given that git-fetch(1) does not support atomic
fetches, so each reference gets its own transaction. It is quite
inflexible though, as `s_update_ref()` only knows about a single
reference update at a time, so it doesn't allow us to alter the
strategy.

This commit prepares `s_update_ref()` and its only caller
`update_local_ref()` to allow passing an external transaction. If none
is given, then the existing behaviour is triggered which creates a new
transaction and directly commits it. Otherwise, if the caller provides a
transaction, then we only queue the update but don't commit it. This
optionally allows the caller to manage when a transaction will be
committed.

Given that `update_local_ref()` is always called with a `NULL`
transaction for now, no change in behaviour is expected from this
change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5221a9dbed..654e0bb520 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -583,11 +583,12 @@ static struct ref *get_ref_map(struct remote *remote,
 
 static int s_update_ref(const char *action,
 			struct ref *ref,
+			struct ref_transaction *transaction,
 			int check_old)
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *transaction;
+	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -597,10 +598,17 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	transaction = ref_transaction_begin(&err);
+	/*
+	 * If no transaction was passed to us, we manage the transaction
+	 * ourselves. Otherwise, we trust the caller to handle the transaction
+	 * lifecycle.
+	 */
 	if (!transaction) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
+		transaction = our_transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			ret = STORE_REF_ERROR_OTHER;
+			goto out;
+		}
 	}
 
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
@@ -611,15 +619,17 @@ static int s_update_ref(const char *action,
 		goto out;
 	}
 
-	ret = ref_transaction_commit(transaction, &err);
-	if (ret) {
-		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
-							 : STORE_REF_ERROR_OTHER;
-		goto out;
+	if (our_transaction) {
+		ret = ref_transaction_commit(our_transaction, &err);
+		if (ret) {
+			ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
+								 : STORE_REF_ERROR_OTHER;
+			goto out;
+		}
 	}
 
 out:
-	ref_transaction_free(transaction);
+	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -762,6 +772,7 @@ static void format_display(struct strbuf *display, char code,
 }
 
 static int update_local_ref(struct ref *ref,
+			    struct ref_transaction *transaction,
 			    const char *remote,
 			    const struct ref *remote_ref,
 			    struct strbuf *display,
@@ -802,7 +813,7 @@ static int update_local_ref(struct ref *ref,
 	    starts_with(ref->name, "refs/tags/")) {
 		if (force || ref->force) {
 			int r;
-			r = s_update_ref("updating tag", ref, 0);
+			r = s_update_ref("updating tag", ref, transaction, 0);
 			format_display(display, r ? '!' : 't', _("[tag update]"),
 				       r ? _("unable to update local ref") : NULL,
 				       remote, pretty_ref, summary_width);
@@ -839,7 +850,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		r = s_update_ref(msg, ref, 0);
+		r = s_update_ref(msg, ref, transaction, 0);
 		format_display(display, r ? '!' : '*', what,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -861,7 +872,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("fast-forward", ref, 1);
+		r = s_update_ref("fast-forward", ref, transaction, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -873,7 +884,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("forced-update", ref, 1);
+		r = s_update_ref("forced-update", ref, transaction, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
 			       r ? _("unable to update local ref") : _("forced update"),
 			       remote, pretty_ref, summary_width);
@@ -1078,8 +1089,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, what, rm, &note,
-						       summary_width);
+				rc |= update_local_ref(ref, NULL, what,
+						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
 				/*
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2021-01-08 12:11   ` [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
@ 2021-01-08 12:11   ` Patrick Steinhardt
  2021-01-09  0:05     ` Junio C Hamano
  3 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-08 12:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 14946 bytes --]

When executing a fetch, then git will currently allocate one reference
transaction per reference update and directly commit it. This means that
fetches are non-atomic: even if some of the reference updates fail,
others may still succeed and modify local references.

This is fine in many scenarios, but this strategy has its downsides.

- The view of remote references may be inconsistent and may show a
  bastardized state of the remote repository.

- Batching together updates may improve performance in certain
  scenarios. While the impact probably isn't as pronounced with loose
  references, the upcoming reftable backend may benefit as it needs to
  write less files in case the update is batched.

- The reference-update hook is currently being executed twice per
  updated reference. While this doesn't matter when there is no such
  hook, we have seen severe performance regressions when doing a
  git-fetch(1) with reference-transaction hook when the remote
  repository has hundreds of thousands of references.

Similar to `git push --atomic`, this commit thus introduces atomic
fetches. Instead of allocating one reference transaction per updated
reference, it causes us to only allocate a single transaction and commit
it as soon as all updates were received. If locking of any reference
fails, then we abort the complete transaction and don't update any
reference, which gives us an all-or-nothing fetch.

Note that this may not completely fix the first of above downsides, as
the consistent view also depends on the server-side. If the server
doesn't have a consistent view of its own references during the
reference negotiation phase, then the client would get the same
inconsistent view the server has. This is a separate problem though and,
if it actually exists, can be fixed at a later point.

This commit also changes the way we write FETCH_HEAD in case `--atomic`
is passed. Instead of writing changes as we go, we need to accumulate
all changes first and only commit them at the end when we know that all
reference updates succeeded. Ideally, we'd just do so via a temporary
file so that we don't need to carry all updates in-memory. This isn't
trivially doable though considering the `--append` mode, where we do not
truncate the file but simply append to it. And given that we support
concurrent processes appending to FETCH_HEAD at the same time without
any loss of data, seeding the temporary file with current contents of
FETCH_HEAD initially and then doing a rename wouldn't work either. So
this commit implements the simple strategy of buffering all changes and
appending them to the file on commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 |  66 ++++++++++---
 t/t5510-fetch.sh                | 168 ++++++++++++++++++++++++++++++++
 3 files changed, 227 insertions(+), 11 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 2bf77b46fd..07783deee3 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -7,6 +7,10 @@
 	existing contents of `.git/FETCH_HEAD`.  Without this
 	option old data in `.git/FETCH_HEAD` will be overwritten.
 
+--atomic::
+	Use an atomic transaction to update local refs. Either all refs are
+	updated, or on error, no refs are updated.
+
 --depth=<depth>::
 	Limit fetching to the specified number of commits from the tip of
 	each remote branch history. If fetching to a 'shallow' repository
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 654e0bb520..e3446fc493 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -63,6 +63,7 @@ static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
 static int max_jobs = -1, submodule_fetch_jobs_config = -1;
 static int fetch_parallel_config = 1;
+static int atomic_fetch;
 static enum transport_family family;
 static const char *depth;
 static const char *deepen_since;
@@ -144,6 +145,8 @@ static struct option builtin_fetch_options[] = {
 		 N_("set upstream for git pull/fetch")),
 	OPT_BOOL('a', "append", &append,
 		 N_("append to .git/FETCH_HEAD instead of overwriting")),
+	OPT_BOOL(0, "atomic", &atomic_fetch,
+		 N_("use atomic transaction to update references")),
 	OPT_STRING(0, "upload-pack", &upload_pack, N_("path"),
 		   N_("path to upload pack on remote end")),
 	OPT__FORCE(&force, N_("force overwrite of local reference"), 0),
@@ -913,6 +916,7 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
 
 struct fetch_head {
 	FILE *fp;
+	struct strbuf buf;
 };
 
 static int open_fetch_head(struct fetch_head *fetch_head)
@@ -922,6 +926,7 @@ static int open_fetch_head(struct fetch_head *fetch_head)
 	if (!write_fetch_head)
 		return 0;
 
+	strbuf_init(&fetch_head->buf, 0);
 	fetch_head->fp = fopen(filename, "a");
 	if (!fetch_head->fp)
 		return error_errno(_("cannot open %s"), filename);
@@ -938,19 +943,34 @@ static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid
 	if (!write_fetch_head)
 		return;
 
-	fprintf(fetch_head->fp, "%s\t%s\t%s",
-		old_oid, merge_status_marker, note);
-	for (i = 0; i < url_len; ++i)
-		if ('\n' == url[i])
-			fputs("\\n", fetch_head->fp);
-		else
-			fputc(url[i], fetch_head->fp);
-	fputc('\n', fetch_head->fp);
+	/*
+	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
+	 * any of the reference updates fails. We thus have to write all
+	 * updates to a buffer first and only commit it as soon as all
+	 * references have been successfully updated.
+	 */
+	if (atomic_fetch) {
+		strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
+			    old_oid, merge_status_marker, note);
+		strbuf_add(&fetch_head->buf, url, url_len);
+		strbuf_addch(&fetch_head->buf, '\n');
+	} else {
+		fprintf(fetch_head->fp, "%s\t%s\t%s",
+			old_oid, merge_status_marker, note);
+		for (i = 0; i < url_len; ++i)
+			if ('\n' == url[i])
+				fputs("\\n", fetch_head->fp);
+			else
+				fputc(url[i], fetch_head->fp);
+		fputc('\n', fetch_head->fp);
+	}
 }
 
 static void commit_fetch_head(struct fetch_head *fetch_head)
 {
-	/* Nothing to commit yet. */
+	if (!write_fetch_head || !atomic_fetch)
+		return;
+	strbuf_write(&fetch_head->buf, fetch_head->fp);
 }
 
 static void close_fetch_head(struct fetch_head *fetch_head)
@@ -959,6 +979,7 @@ static void close_fetch_head(struct fetch_head *fetch_head)
 		return;
 
 	fclose(fetch_head->fp);
+	strbuf_release(&fetch_head->buf);
 }
 
 static const char warn_show_forced_updates[] =
@@ -976,7 +997,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	struct fetch_head fetch_head;
 	struct commit *commit;
 	int url_len, i, rc = 0;
-	struct strbuf note = STRBUF_INIT;
+	struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+	struct ref_transaction *transaction = NULL;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
@@ -1002,6 +1024,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (atomic_fetch) {
+		transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	prepare_format_display(ref_map);
 
 	/*
@@ -1089,7 +1119,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, NULL, what,
+				rc |= update_local_ref(ref, transaction, what,
 						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
@@ -1115,6 +1145,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc && transaction) {
+		rc = ref_transaction_commit(transaction, &err);
+		if (rc) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	if (!rc)
 		commit_fetch_head(&fetch_head);
 
@@ -1134,6 +1172,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
  abort:
 	strbuf_release(&note);
+	strbuf_release(&err);
+	ref_transaction_free(transaction);
 	free(url);
 	close_fetch_head(&fetch_head);
 	return rc;
@@ -1945,6 +1985,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			die(_("--filter can only be used with the remote "
 			      "configured in extensions.partialclone"));
 
+		if (atomic_fetch)
+			die(_("--atomic can only be used when fetching "
+			      "from one remote"));
+
 		if (stdin_refspecs)
 			die(_("--stdin can only be used when fetching "
 			      "from one remote"));
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2013051a64..109d15be98 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -176,6 +176,174 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
 	git rev-parse sometag
 '
 
+test_expect_success 'fetch --atomic works with a single branch' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch &&
+	oid=$(git rev-parse atomic-branch) &&
+	echo "$oid" >expected &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse origin/atomic-branch >actual &&
+	test_cmp expected actual &&
+	test $oid = "$(git -C atomic rev-parse --verify FETCH_HEAD)"
+'
+
+test_expect_success 'fetch --atomic works with multiple branches' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch-1 &&
+	git branch atomic-branch-2 &&
+	git branch atomic-branch-3 &&
+	git rev-parse refs/heads/atomic-branch-1 refs/heads/atomic-branch-2 refs/heads/atomic-branch-3 >actual &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-branch-1 refs/remotes/origin/atomic-branch-2 refs/remotes/origin/atomic-branch-3 >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic works with mixed branches and tags' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-mixed-branch &&
+	git tag atomic-mixed-tag &&
+	git rev-parse refs/heads/atomic-mixed-branch refs/tags/atomic-mixed-tag >actual &&
+
+	git -C atomic fetch --tags --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-mixed-branch refs/tags/atomic-mixed-tag >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic prunes references' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-prune-delete &&
+	git clone . atomic &&
+	git branch --delete atomic-prune-delete &&
+	git branch atomic-prune-create &&
+	git rev-parse refs/heads/atomic-prune-create >actual &&
+
+	git -C atomic fetch --prune --atomic origin &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-prune-delete &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-prune-create >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic aborts with non-fast-forward update' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-non-ff &&
+	git clone . atomic &&
+	git rev-parse HEAD >actual &&
+
+	git branch atomic-new-branch &&
+	parent_commit=$(git rev-parse atomic-non-ff~) &&
+	git update-ref refs/heads/atomic-non-ff $parent_commit &&
+
+	test_must_fail git -C atomic fetch --atomic origin refs/heads/*:refs/remotes/origin/* &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-new-branch &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-non-ff >expected &&
+	test_cmp expected actual &&
+	test_must_be_empty atomic/.git/FETCH_HEAD
+'
+
+test_expect_success 'fetch --atomic executes a single reference transaction only' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-1 &&
+	git branch atomic-hooks-2 &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+		committed
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+	EOF
+
+	git -C atomic fetch --atomic origin &&
+	test_cmp expected atomic/actual
+'
+
+test_expect_success 'fetch --atomic aborts all reference updates if hook aborts' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-abort-1 &&
+	git branch atomic-hooks-abort-2 &&
+	git branch atomic-hooks-abort-3 &&
+	git tag atomic-hooks-abort &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+		aborted
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+		exit 1
+	EOF
+
+	git -C atomic for-each-ref >expected-refs &&
+	test_must_fail git -C atomic fetch --tags --atomic origin &&
+	git -C atomic for-each-ref >actual-refs &&
+	test_cmp expected-refs actual-refs &&
+	test_must_be_empty atomic/.git/FETCH_HEAD
+'
+
+test_expect_success 'fetch --atomic --append appends to FETCH_HEAD' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	oid=$(git rev-parse HEAD) &&
+
+	git branch atomic-fetch-head-1 &&
+	git -C atomic fetch --atomic origin atomic-fetch-head-1 &&
+	test_line_count = 1 atomic/.git/FETCH_HEAD &&
+
+	git branch atomic-fetch-head-2 &&
+	git -C atomic fetch --atomic --append origin atomic-fetch-head-2 &&
+	test_line_count = 2 atomic/.git/FETCH_HEAD &&
+	cp atomic/.git/FETCH_HEAD expected &&
+
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		exit 1
+	EOF
+
+	git branch atomic-fetch-head-3 &&
+	test_must_fail git -C atomic fetch --atomic --append origin atomic-fetch-head-3 &&
+	test_cmp expected atomic/.git/FETCH_HEAD
+'
+
 test_expect_success '--refmap="" ignores configured refspec' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone "$D" remote-refs &&
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD
  2021-01-08 12:11   ` [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
@ 2021-01-08 23:40     ` Junio C Hamano
  2021-01-11 10:26       ` Patrick Steinhardt
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2021-01-08 23:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> +static int open_fetch_head(struct fetch_head *fetch_head)
> +{
> +	const char *filename = git_path_fetch_head(the_repository);
> +
> +	if (!write_fetch_head)
> +		return 0;
> +
> +	fetch_head->fp = fopen(filename, "a");
> +	if (!fetch_head->fp)
> +		return error_errno(_("cannot open %s"), filename);
> +
> +	return 0;
> +}

So the difference from the original, which used to have a writable
filehandle to /dev/null in the dry-run mode, is that fetch_head->fp
is left as-is (not even NULLed out).

> +static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid,

It is clear from the type these days but variable names like
"old_oid" hint the readers that they are not a hexadecimal object
name string but either an array of uchar[40] or a struct object_id;
perhaps "old_oid_hex" would be less misleading.

If the caller does have struct object_id, then it would be even
better to take it as-is as a parameter and use oid_to_hex_r() on it in
this function when it is given to fprintf().  [Nit #1]

> +			      const char *merge_status_marker, const char *note,
> +			      const char *url, size_t url_len)
> +{
> +	size_t i;
> +
> +	if (!write_fetch_head)
> +		return;

Presumably, this check is what makes sure that fetch_head->fp that
is left uninitialized will never gets used.

> +	fprintf(fetch_head->fp, "%s\t%s\t%s",
> +		old_oid, merge_status_marker, note);
> +	for (i = 0; i < url_len; ++i)
> +		if ('\n' == url[i])
> +			fputs("\\n", fetch_head->fp);
> +		else
> +			fputc(url[i], fetch_head->fp);
> +	fputc('\n', fetch_head->fp);
> +}

OK.  This is the "case FETCH_HEAD_NOT_FOR_MERGE" and "case
FETCH_HEAD_MERGE" parts in the original.

As an abstraction, it may be better to make the caller pass a
boolean "is this for merge?" and keep the knowledge of what exact
string is used for merge_status_marker to this function, instead of
letting the caller passing it as a parameter in the string form.
After all, we never allow anything other than an empty string or a
fixed "not-for-merge" string in that place in the file format.
[Nit #2]

> +static void commit_fetch_head(struct fetch_head *fetch_head)
> +{
> +	/* Nothing to commit yet. */
> +}
> +
> +static void close_fetch_head(struct fetch_head *fetch_head)
> +{
> +	if (!write_fetch_head)
> +		return;

So is this check a protection against uninitialized fetch_head->fp.
Both changes make sense.

> +	fclose(fetch_head->fp);
> +}

> @@ -909,22 +959,19 @@ N_("It took %.2f seconds to check forced updates. You can use\n"
>  static int store_updated_refs(const char *raw_url, const char *remote_name,
>  			      int connectivity_checked, struct ref *ref_map)
>  {
> -	FILE *fp;
> +	struct fetch_head fetch_head;

And that is why this variable is left uninitialised on stack and it
is OK.  An alternative design would be to initialize fetch_head.fp
to NULL, and return early with "if (!fetch_head->fp)" in the two
functions that take fetch_head struct.  That way, we have less
reliance on the global variable write_fetch_head.

>  	struct commit *commit;
>  	int url_len, i, rc = 0;
>  	struct strbuf note = STRBUF_INIT;
>  	const char *what, *kind;
>  	struct ref *rm;
>  	char *url;
> -	const char *filename = (!write_fetch_head
> -				? "/dev/null"
> -				: git_path_fetch_head(the_repository));
>  	int want_status;
>  	int summary_width = transport_summary_width(ref_map);
>  
> -	fp = fopen(filename, "a");
> -	if (!fp)
> -		return error_errno(_("cannot open %s"), filename);
> +	rc = open_fetch_head(&fetch_head);
> +	if (rc)
> +		return -1;

OK, we've already said "cannot open" in the open_fetch_head()
function, so we just return an error silently.

> @@ -1016,16 +1063,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  				merge_status_marker = "not-for-merge";
>  				/* fall-through */
>  			case FETCH_HEAD_MERGE:
> -				fprintf(fp, "%s\t%s\t%s",
> -					oid_to_hex(&rm->old_oid),
> -					merge_status_marker,
> -					note.buf);
> -				for (i = 0; i < url_len; ++i)
> -					if ('\n' == url[i])
> -						fputs("\\n", fp);
> -					else
> -						fputc(url[i], fp);
> -				fputc('\n', fp);
> +				append_fetch_head(&fetch_head,
> +						  oid_to_hex(&rm->old_oid),
> +						  merge_status_marker,
> +						  note.buf, url, url_len);

Here, we can lose merge_status_marker variable from this caller, and
then this caller becomes:

		switch (rm->fetch_head_status) {
		case FETCH_HEAD_NOT_FOR_MERGE:
		case FETCH_HEAD_MERGE:
			append_fetch_head(&fetch_head, &rm->old_oid,
				rm->fetch_head_status == FETCH_HEAD_MERGE,
                                note.buf, url, url_len);

Note that I am passing "is this a ref to be merged?" boolean to keep
the exact string of "not-for-merge" in the callee.

> @@ -1060,6 +1101,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  		}
>  	}
>  
> +	if (!rc)
> +		commit_fetch_head(&fetch_head);
> +
>  	if (rc & STORE_REF_ERROR_DF_CONFLICT)
>  		error(_("some local refs could not be updated; try running\n"
>  		      " 'git remote prune %s' to remove any old, conflicting "
> @@ -1077,7 +1121,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>   abort:
>  	strbuf_release(&note);
>  	free(url);
> -	fclose(fp);
> +	close_fetch_head(&fetch_head);
>  	return rc;
>  }

Other than the above two nits, this step looks good to me.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path
  2021-01-08 12:11   ` [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
@ 2021-01-08 23:50     ` Junio C Hamano
  2021-01-11 10:28       ` Patrick Steinhardt
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2021-01-08 23:50 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> @@ -598,30 +598,33 @@ static int s_update_ref(const char *action,
>  	msg = xstrfmt("%s: %s", rla, action);
>  
>  	transaction = ref_transaction_begin(&err);
> -	if (!transaction ||
> -	    ref_transaction_update(transaction, ref->name,
> -				   &ref->new_oid,
> -				   check_old ? &ref->old_oid : NULL,
> -				   0, msg, &err))
> -		goto fail;
> +	if (!transaction) {
> +		ret = STORE_REF_ERROR_OTHER;
> +		goto out;
> +	}
> +
> +	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
> +				     check_old ? &ref->old_oid : NULL,
> +				     0, msg, &err);
> +	if (ret) {
> +		ret = STORE_REF_ERROR_OTHER;
> +		goto out;
> +	}

The above certainly got cleaner thanks to Christian's suggestion,
but I wonder why

	transaction = ref_transaction_begin(&err);
	if (!transaction ||
	    ref_transaction_update(transaction, ref->name,
				   &ref->new_oid,
				   check_old ? &ref->old_oid : NULL,
				   0, msg, &err)) {
		ret = STORE_REF_ERROR_OTHER;
		goto out;
	}

shouldn't be sufficient.

>  	ret = ref_transaction_commit(transaction, &err);
>  	if (ret) {
> -		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
> -		goto fail;
> +		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
> +							 : STORE_REF_ERROR_OTHER;
> +		goto out;
>  	}
> 
> +out:
>  	ref_transaction_free(transaction);

It is a bit funny to see a goto that jumps to the label without
having anything else in between, but we know we will be adding more
code just before the "out:" label, so it is a good preliminary
preparation.

I think a variant that is much easier to follow would be to write
like this instead:

	switch (ref_transaction_commit(transaction, &err)) {
        case 0: /* happy */
		break;
	case TRANSACTION_NAME_CONFLICT:
		ret = STORE_REF_ERROR_DF_CONFLICT;
		goto out;
	default:
		ret = STORE_REF_ERROR_OTHER;
		goto out;
	}

> +	if (ret)
> +		error("%s", err.buf);

OK.

>  	strbuf_release(&err);
>  	free(msg);
> -	return 0;
> -fail:
> -	ref_transaction_free(transaction);
> -	error("%s", err.buf);
> -	strbuf_release(&err);
> -	free(msg);
> -	return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
> -			   : STORE_REF_ERROR_OTHER;
> +	return ret;
>  }
>  
>  static int refcol_width = 10;

THanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-08 12:11   ` [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
@ 2021-01-08 23:53     ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-08 23:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> @@ -597,10 +598,17 @@ static int s_update_ref(const char *action,
>  		rla = default_rla.buf;
>  	msg = xstrfmt("%s: %s", rla, action);
>  
> -	transaction = ref_transaction_begin(&err);
> +	/*
> +	 * If no transaction was passed to us, we manage the transaction
> +	 * ourselves. Otherwise, we trust the caller to handle the transaction
> +	 * lifecycle.
> +	 */
>  	if (!transaction) {
> -		ret = STORE_REF_ERROR_OTHER;
> -		goto out;
> +		transaction = our_transaction = ref_transaction_begin(&err);
> +		if (!transaction) {
> +			ret = STORE_REF_ERROR_OTHER;
> +			goto out;
> +		}
>  	}

OK, this answers the question I posed in the review of the previous
step.  We need to separate out "if (!transaction || ...)" into two
anyway with this step, so it is easier to see what changed in this
step if we separated in the previous preparatory clean-up step.

> @@ -611,15 +619,17 @@ static int s_update_ref(const char *action,
>  		goto out;
>  	}
>  
> -	ret = ref_transaction_commit(transaction, &err);
> -	if (ret) {
> -		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
> -							 : STORE_REF_ERROR_OTHER;
> -		goto out;
> +	if (our_transaction) {
> +		ret = ref_transaction_commit(our_transaction, &err);
> +		if (ret) {
> +			ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
> +								 : STORE_REF_ERROR_OTHER;
> +			goto out;
> +		}

The switch statement suggested earlier would shine when the
constants involved have such long names.

>  	}
>  
>  out:
> -	ref_transaction_free(transaction);
> +	ref_transaction_free(our_transaction);
>  	if (ret)
>  		error("%s", err.buf);
>  	strbuf_release(&err);

Makes sense.  Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-08 12:11   ` [PATCH v2 4/4] fetch: implement support for atomic reference updates Patrick Steinhardt
@ 2021-01-09  0:05     ` Junio C Hamano
  2021-01-11 10:42       ` Patrick Steinhardt
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2021-01-09  0:05 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> +	/*
> +	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
> +	 * any of the reference updates fails. We thus have to write all
> +	 * updates to a buffer first and only commit it as soon as all
> +	 * references have been successfully updated.
> +	 */
> +	if (atomic_fetch) {
> +		strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
> +			    old_oid, merge_status_marker, note);
> +		strbuf_add(&fetch_head->buf, url, url_len);
> +		strbuf_addch(&fetch_head->buf, '\n');
> +	} else {
> +		fprintf(fetch_head->fp, "%s\t%s\t%s",
> +			old_oid, merge_status_marker, note);
> +		for (i = 0; i < url_len; ++i)
> +			if ('\n' == url[i])
> +				fputs("\\n", fetch_head->fp);
> +			else
> +				fputc(url[i], fetch_head->fp);
> +		fputc('\n', fetch_head->fp);
> +	}

I do not want to see it done like this; formating what ought to come
out identical with two separate mechanisms will lead to bugs under
the road.

Also what is the deal about "\n" vs "\\n"?  Do we already have
behaviour differences between two codepaths from the get-go?

It would be much more preferrable to see this series go like so:

    [1/4] create append_fetch_head() that writes out to
          fetch_head->fp

    [1.5/4] convert append_fetch_head() to ALWAYS accumulate in
            fetch_head->buf, and ALWAYS flushes what is accumulated
            at the end.

After these two patches are applied, there shouldn't be any
behaviour change or change in the format in generated FETCH_HEAD.

    [2/4] and [3/4] stay the same

    [4/4] this step does not touch append_fetch_head() at all.  It
    just changes the way how fetch_head->buf is flushed at the end
    (namely, under atomic mode and after seeing any failure, the
    accumulated output gets discarded without being written).

I also thought briefly about an alternative approach to rewind and
truncate the output to its original length upon seeing a failure
under the atomic mode, but rejected it because the code gets hairy.
I think "accumulate until we know we want to actually write them out"
is a good approach to this issue.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD
  2021-01-08 23:40     ` Junio C Hamano
@ 2021-01-11 10:26       ` Patrick Steinhardt
  2021-01-11 19:24         ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 10:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3810 bytes --]

On Fri, Jan 08, 2021 at 03:40:17PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid,
> 
> It is clear from the type these days but variable names like
> "old_oid" hint the readers that they are not a hexadecimal object
> name string but either an array of uchar[40] or a struct object_id;
> perhaps "old_oid_hex" would be less misleading.
> 
> If the caller does have struct object_id, then it would be even
> better to take it as-is as a parameter and use oid_to_hex_r() on it in
> this function when it is given to fprintf().  [Nit #1]

Agreed.

[snip]
> > +	fprintf(fetch_head->fp, "%s\t%s\t%s",
> > +		old_oid, merge_status_marker, note);
> > +	for (i = 0; i < url_len; ++i)
> > +		if ('\n' == url[i])
> > +			fputs("\\n", fetch_head->fp);
> > +		else
> > +			fputc(url[i], fetch_head->fp);
> > +	fputc('\n', fetch_head->fp);
> > +}
> 
> OK.  This is the "case FETCH_HEAD_NOT_FOR_MERGE" and "case
> FETCH_HEAD_MERGE" parts in the original.
> 
> As an abstraction, it may be better to make the caller pass a
> boolean "is this for merge?" and keep the knowledge of what exact
> string is used for merge_status_marker to this function, instead of
> letting the caller passing it as a parameter in the string form.
> After all, we never allow anything other than an empty string or a
> fixed "not-for-merge" string in that place in the file format.
> [Nit #2]

I think it's even nicer to just pass in `rm->fetch_head_status`
directly, which allows us to move below switch into `append_fetch_head`.

[snip]
> > +	fclose(fetch_head->fp);
> > +}
> 
> > @@ -909,22 +959,19 @@ N_("It took %.2f seconds to check forced updates. You can use\n"
> >  static int store_updated_refs(const char *raw_url, const char *remote_name,
> >  			      int connectivity_checked, struct ref *ref_map)
> >  {
> > -	FILE *fp;
> > +	struct fetch_head fetch_head;
> 
> And that is why this variable is left uninitialised on stack and it
> is OK.  An alternative design would be to initialize fetch_head.fp
> to NULL, and return early with "if (!fetch_head->fp)" in the two
> functions that take fetch_head struct.  That way, we have less
> reliance on the global variable write_fetch_head.

I like your proposal more. I wasn't quite happy with leaving `fp`
uninitialized, and using it as a sentinel for whether to write something
or not feels natural to me.

[snip]
> > @@ -1016,16 +1063,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
> >  				merge_status_marker = "not-for-merge";
> >  				/* fall-through */
> >  			case FETCH_HEAD_MERGE:
> > -				fprintf(fp, "%s\t%s\t%s",
> > -					oid_to_hex(&rm->old_oid),
> > -					merge_status_marker,
> > -					note.buf);
> > -				for (i = 0; i < url_len; ++i)
> > -					if ('\n' == url[i])
> > -						fputs("\\n", fp);
> > -					else
> > -						fputc(url[i], fp);
> > -				fputc('\n', fp);
> > +				append_fetch_head(&fetch_head,
> > +						  oid_to_hex(&rm->old_oid),
> > +						  merge_status_marker,
> > +						  note.buf, url, url_len);
> 
> Here, we can lose merge_status_marker variable from this caller, and
> then this caller becomes:
> 
> 		switch (rm->fetch_head_status) {
> 		case FETCH_HEAD_NOT_FOR_MERGE:
> 		case FETCH_HEAD_MERGE:
> 			append_fetch_head(&fetch_head, &rm->old_oid,
> 				rm->fetch_head_status == FETCH_HEAD_MERGE,
>                                 note.buf, url, url_len);
> 
> Note that I am passing "is this a ref to be merged?" boolean to keep
> the exact string of "not-for-merge" in the callee.

As said above, I'm just moving this complete switch into
`append_fetch_head` and pass `rm->fetch_head_status`.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path
  2021-01-08 23:50     ` Junio C Hamano
@ 2021-01-11 10:28       ` Patrick Steinhardt
  0 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 10:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

On Fri, Jan 08, 2021 at 03:50:00PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> >  	ret = ref_transaction_commit(transaction, &err);
> >  	if (ret) {
> > -		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
> > -		goto fail;
> > +		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
> > +							 : STORE_REF_ERROR_OTHER;
> > +		goto out;
> >  	}
> > 
> > +out:
> >  	ref_transaction_free(transaction);
> 
> It is a bit funny to see a goto that jumps to the label without
> having anything else in between, but we know we will be adding more
> code just before the "out:" label, so it is a good preliminary
> preparation.
> 
> I think a variant that is much easier to follow would be to write
> like this instead:
> 
> 	switch (ref_transaction_commit(transaction, &err)) {
>         case 0: /* happy */
> 		break;
> 	case TRANSACTION_NAME_CONFLICT:
> 		ret = STORE_REF_ERROR_DF_CONFLICT;
> 		goto out;
> 	default:
> 		ret = STORE_REF_ERROR_OTHER;
> 		goto out;
> 	}

Agreed, that is easier to read. Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-09  0:05     ` Junio C Hamano
@ 2021-01-11 10:42       ` Patrick Steinhardt
  2021-01-11 19:47         ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 10:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3270 bytes --]

On Fri, Jan 08, 2021 at 04:05:53PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +	/*
> > +	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
> > +	 * any of the reference updates fails. We thus have to write all
> > +	 * updates to a buffer first and only commit it as soon as all
> > +	 * references have been successfully updated.
> > +	 */
> > +	if (atomic_fetch) {
> > +		strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
> > +			    old_oid, merge_status_marker, note);
> > +		strbuf_add(&fetch_head->buf, url, url_len);
> > +		strbuf_addch(&fetch_head->buf, '\n');
> > +	} else {
> > +		fprintf(fetch_head->fp, "%s\t%s\t%s",
> > +			old_oid, merge_status_marker, note);
> > +		for (i = 0; i < url_len; ++i)
> > +			if ('\n' == url[i])
> > +				fputs("\\n", fetch_head->fp);
> > +			else
> > +				fputc(url[i], fetch_head->fp);
> > +		fputc('\n', fetch_head->fp);
> > +	}
> 
> I do not want to see it done like this; formating what ought to come
> out identical with two separate mechanisms will lead to bugs under
> the road.
> 
> Also what is the deal about "\n" vs "\\n"?  Do we already have
> behaviour differences between two codepaths from the get-go?

Good point. I'll unify those code paths.

> It would be much more preferrable to see this series go like so:
> 
>     [1/4] create append_fetch_head() that writes out to
>           fetch_head->fp
> 
>     [1.5/4] convert append_fetch_head() to ALWAYS accumulate in
>             fetch_head->buf, and ALWAYS flushes what is accumulated
>             at the end.

This is a change I'm hesitant to make. The thing is that FETCH_HEAD is
quite weird as the current design allows different `git fetch --append`
processes to write to FETCH_HEAD at the same time. If we change to
always accumulate first and flush once we're done, then we essentially
have a change in behaviour as FETCH_HEADs aren't written in an
interleaving fashion anymore, but only at the end. Also, if there is any
unexpected error in between updating references which causes us to die,
then we wouldn't have written the already updated references to
FETCH_HEAD now.

So I'm all in when it comes to merging formatting directives, even more
so as I have missed the "\\n" weirdness. But for now, I'll keep the
current flushing semantics in the non-atomic case. Please let me know if
you're not convinced and I'll have another look for v4.

> After these two patches are applied, there shouldn't be any
> behaviour change or change in the format in generated FETCH_HEAD.
> 
>     [2/4] and [3/4] stay the same
> 
>     [4/4] this step does not touch append_fetch_head() at all.  It
>     just changes the way how fetch_head->buf is flushed at the end
>     (namely, under atomic mode and after seeing any failure, the
>     accumulated output gets discarded without being written).
> 
> I also thought briefly about an alternative approach to rewind and
> truncate the output to its original length upon seeing a failure
> under the atomic mode, but rejected it because the code gets hairy.
> I think "accumulate until we know we want to actually write them out"
> is a good approach to this issue.
> 
> Thanks.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/5] fetch: implement support for atomic reference updates
  2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
@ 2021-01-11 11:05 ` Patrick Steinhardt
  2021-01-11 11:05   ` [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
                     ` (5 more replies)
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
  4 siblings, 6 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 12479 bytes --]

Hi,

this is the third version of my patch series to implement support for
atomic reference updates for git-fetch(1). It's similar to `git push
--atomic`, only that it applies to the local side. That is the fetch
will either succeed and update all remote references or it will fail and
update none.

Changes compared to v2:

    - `append_fetch_head` now takes a `struct object_id *old_oid`
      instead of a `char *old_oid`.

    - Handling of the merge status marker was moved into
      `append_fetch_head`.

    - I've unified code paths to format the line we're about to write to
      FETCH_HEAD in atomic and non-atomic cases, which is the new 2/5
      patch.

    - We now always initialize `fetch_head->fp` and use it to derive
      whether anything should be written at all instead of using the
      global `write_fetch_head` variable.

I think this should address all of Junio's feedback. Thanks for your
input!

Patrick

Patrick Steinhardt (5):
  fetch: extract writing to FETCH_HEAD
  fetch: use strbuf to format FETCH_HEAD updates
  fetch: refactor `s_update_ref` to use common exit path
  fetch: allow passing a transaction to `s_update_ref()`
  fetch: implement support for atomic reference updates

 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 | 228 +++++++++++++++++++++++---------
 remote.h                        |   2 +-
 t/t5510-fetch.sh                | 168 +++++++++++++++++++++++
 4 files changed, 342 insertions(+), 60 deletions(-)

Range-diff against v2:
1:  d80dbc5a9c ! 1:  61dc19a1ca fetch: extract writing to FETCH_HEAD
    @@ builtin/fetch.c: static int iterate_ref_map(void *cb_data, struct object_id *oid
     +{
     +	const char *filename = git_path_fetch_head(the_repository);
     +
    -+	if (!write_fetch_head)
    -+		return 0;
    -+
    -+	fetch_head->fp = fopen(filename, "a");
    -+	if (!fetch_head->fp)
    -+		return error_errno(_("cannot open %s"), filename);
    ++	if (write_fetch_head) {
    ++		fetch_head->fp = fopen(filename, "a");
    ++		if (!fetch_head->fp)
    ++			return error_errno(_("cannot open %s"), filename);
    ++	} else {
    ++		fetch_head->fp = NULL;
    ++	}
     +
     +	return 0;
     +}
     +
    -+static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid,
    -+			      const char *merge_status_marker, const char *note,
    ++static void append_fetch_head(struct fetch_head *fetch_head,
    ++			      const struct object_id *old_oid,
    ++			      enum fetch_head_status fetch_head_status,
    ++			      const char *note,
     +			      const char *url, size_t url_len)
     +{
    ++	char old_oid_hex[GIT_MAX_HEXSZ + 1];
    ++	const char *merge_status_marker;
     +	size_t i;
     +
    -+	if (!write_fetch_head)
    ++	if (!fetch_head->fp)
     +		return;
     +
    ++	switch (fetch_head_status) {
    ++		case FETCH_HEAD_NOT_FOR_MERGE:
    ++			merge_status_marker = "not-for-merge";
    ++			break;
    ++		case FETCH_HEAD_MERGE:
    ++			merge_status_marker = "";
    ++			break;
    ++		default:
    ++			/* do not write anything to FETCH_HEAD */
    ++			return;
    ++	}
    ++
     +	fprintf(fetch_head->fp, "%s\t%s\t%s",
    -+		old_oid, merge_status_marker, note);
    ++		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
     +	for (i = 0; i < url_len; ++i)
     +		if ('\n' == url[i])
     +			fputs("\\n", fetch_head->fp);
    @@ builtin/fetch.c: static int iterate_ref_map(void *cb_data, struct object_id *oid
     +
     +static void close_fetch_head(struct fetch_head *fetch_head)
     +{
    -+	if (!write_fetch_head)
    ++	if (!fetch_head->fp)
     +		return;
     +
     +	fclose(fetch_head->fp);
    @@ builtin/fetch.c: N_("It took %.2f seconds to check forced updates. You can use\n
      	if (raw_url)
      		url = transport_anonymize_url(raw_url);
     @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
    - 				merge_status_marker = "not-for-merge";
    - 				/* fall-through */
    - 			case FETCH_HEAD_MERGE:
    + 	     want_status++) {
    + 		for (rm = ref_map; rm; rm = rm->next) {
    + 			struct ref *ref = NULL;
    +-			const char *merge_status_marker = "";
    + 
    + 			if (rm->status == REF_STATUS_REJECT_SHALLOW) {
    + 				if (want_status == FETCH_HEAD_MERGE)
    +@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
    + 					strbuf_addf(&note, "%s ", kind);
    + 				strbuf_addf(&note, "'%s' of ", what);
    + 			}
    +-			switch (rm->fetch_head_status) {
    +-			case FETCH_HEAD_NOT_FOR_MERGE:
    +-				merge_status_marker = "not-for-merge";
    +-				/* fall-through */
    +-			case FETCH_HEAD_MERGE:
     -				fprintf(fp, "%s\t%s\t%s",
     -					oid_to_hex(&rm->old_oid),
     -					merge_status_marker,
    @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *
     -					else
     -						fputc(url[i], fp);
     -				fputc('\n', fp);
    -+				append_fetch_head(&fetch_head,
    -+						  oid_to_hex(&rm->old_oid),
    -+						  merge_status_marker,
    -+						  note.buf, url, url_len);
    - 				break;
    - 			default:
    - 				/* do not write anything to FETCH_HEAD */
    +-				break;
    +-			default:
    +-				/* do not write anything to FETCH_HEAD */
    +-				break;
    +-			}
    ++
    ++			append_fetch_head(&fetch_head, &rm->old_oid,
    ++					  rm->fetch_head_status,
    ++					  note.buf, url, url_len);
    + 
    + 			strbuf_reset(&note);
    + 			if (ref) {
     @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
      		}
      	}
    @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *
      	return rc;
      }
      
    +
    + ## remote.h ##
    +@@ remote.h: struct ref {
    + 	 * should be 0, so that xcalloc'd structures get it
    + 	 * by default.
    + 	 */
    +-	enum {
    ++	enum fetch_head_status {
    + 		FETCH_HEAD_MERGE = -1,
    + 		FETCH_HEAD_NOT_FOR_MERGE = 0,
    + 		FETCH_HEAD_IGNORE = 1
-:  ---------- > 2:  a19762690e fetch: use strbuf to format FETCH_HEAD updates
2:  718a8bf5d7 ! 3:  c411f30e09 fetch: refactor `s_update_ref` to use common exit path
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
     -				   check_old ? &ref->old_oid : NULL,
     -				   0, msg, &err))
     -		goto fail;
    +-
    +-	ret = ref_transaction_commit(transaction, &err);
    +-	if (ret) {
    +-		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
    +-		goto fail;
     +	if (!transaction) {
     +		ret = STORE_REF_ERROR_OTHER;
     +		goto out;
    -+	}
    -+
    + 	}
    + 
     +	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
     +				     check_old ? &ref->old_oid : NULL,
     +				     0, msg, &err);
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
     +		ret = STORE_REF_ERROR_OTHER;
     +		goto out;
     +	}
    - 
    - 	ret = ref_transaction_commit(transaction, &err);
    - 	if (ret) {
    --		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
    --		goto fail;
    -+		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
    -+							 : STORE_REF_ERROR_OTHER;
    ++
    ++	switch (ref_transaction_commit(transaction, &err)) {
    ++	case 0:
    ++		break;
    ++	case TRANSACTION_NAME_CONFLICT:
    ++		ret = STORE_REF_ERROR_DF_CONFLICT;
     +		goto out;
    - 	}
    - 
    ++	default:
    ++		ret = STORE_REF_ERROR_OTHER;
    ++		goto out;
    ++	}
    ++
     +out:
      	ref_transaction_free(transaction);
     +	if (ret)
3:  4162d10fcb ! 4:  865d357ba7 fetch: allow passing a transaction to `s_update_ref()`
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
      		goto out;
      	}
      
    --	ret = ref_transaction_commit(transaction, &err);
    --	if (ret) {
    --		ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
    --							 : STORE_REF_ERROR_OTHER;
    +-	switch (ref_transaction_commit(transaction, &err)) {
    +-	case 0:
    +-		break;
    +-	case TRANSACTION_NAME_CONFLICT:
    +-		ret = STORE_REF_ERROR_DF_CONFLICT;
    +-		goto out;
    +-	default:
    +-		ret = STORE_REF_ERROR_OTHER;
     -		goto out;
     +	if (our_transaction) {
    -+		ret = ref_transaction_commit(our_transaction, &err);
    -+		if (ret) {
    -+			ret = (ret == TRANSACTION_NAME_CONFLICT) ? STORE_REF_ERROR_DF_CONFLICT
    -+								 : STORE_REF_ERROR_OTHER;
    ++		switch (ref_transaction_commit(our_transaction, &err)) {
    ++		case 0:
    ++			break;
    ++		case TRANSACTION_NAME_CONFLICT:
    ++			ret = STORE_REF_ERROR_DF_CONFLICT;
    ++			goto out;
    ++		default:
    ++			ret = STORE_REF_ERROR_OTHER;
     +			goto out;
     +		}
      	}
4:  53705281b6 ! 5:  6a79e7adcc fetch: implement support for atomic reference updates
    @@ builtin/fetch.c: static struct option builtin_fetch_options[] = {
      	OPT_STRING(0, "upload-pack", &upload_pack, N_("path"),
      		   N_("path to upload pack on remote end")),
      	OPT__FORCE(&force, N_("force overwrite of local reference"), 0),
    -@@ builtin/fetch.c: static int iterate_ref_map(void *cb_data, struct object_id *oid)
    +@@ builtin/fetch.c: static void append_fetch_head(struct fetch_head *fetch_head,
    + 			strbuf_addch(&fetch_head->buf, url[i]);
    + 	strbuf_addch(&fetch_head->buf, '\n');
      
    - struct fetch_head {
    - 	FILE *fp;
    -+	struct strbuf buf;
    - };
    - 
    - static int open_fetch_head(struct fetch_head *fetch_head)
    -@@ builtin/fetch.c: static int open_fetch_head(struct fetch_head *fetch_head)
    - 	if (!write_fetch_head)
    - 		return 0;
    - 
    -+	strbuf_init(&fetch_head->buf, 0);
    - 	fetch_head->fp = fopen(filename, "a");
    - 	if (!fetch_head->fp)
    - 		return error_errno(_("cannot open %s"), filename);
    -@@ builtin/fetch.c: static void append_fetch_head(struct fetch_head *fetch_head, const char *old_oid
    - 	if (!write_fetch_head)
    - 		return;
    - 
    --	fprintf(fetch_head->fp, "%s\t%s\t%s",
    --		old_oid, merge_status_marker, note);
    --	for (i = 0; i < url_len; ++i)
    --		if ('\n' == url[i])
    --			fputs("\\n", fetch_head->fp);
    --		else
    --			fputc(url[i], fetch_head->fp);
    --	fputc('\n', fetch_head->fp);
    +-	strbuf_write(&fetch_head->buf, fetch_head->fp);
    +-	strbuf_reset(&fetch_head->buf);
     +	/*
     +	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
     +	 * any of the reference updates fails. We thus have to write all
     +	 * updates to a buffer first and only commit it as soon as all
     +	 * references have been successfully updated.
     +	 */
    -+	if (atomic_fetch) {
    -+		strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
    -+			    old_oid, merge_status_marker, note);
    -+		strbuf_add(&fetch_head->buf, url, url_len);
    -+		strbuf_addch(&fetch_head->buf, '\n');
    -+	} else {
    -+		fprintf(fetch_head->fp, "%s\t%s\t%s",
    -+			old_oid, merge_status_marker, note);
    -+		for (i = 0; i < url_len; ++i)
    -+			if ('\n' == url[i])
    -+				fputs("\\n", fetch_head->fp);
    -+			else
    -+				fputc(url[i], fetch_head->fp);
    -+		fputc('\n', fetch_head->fp);
    ++	if (!atomic_fetch) {
    ++		strbuf_write(&fetch_head->buf, fetch_head->fp);
    ++		strbuf_reset(&fetch_head->buf);
     +	}
      }
      
      static void commit_fetch_head(struct fetch_head *fetch_head)
      {
     -	/* Nothing to commit yet. */
    -+	if (!write_fetch_head || !atomic_fetch)
    ++	if (!fetch_head->fp || !atomic_fetch)
     +		return;
     +	strbuf_write(&fetch_head->buf, fetch_head->fp);
      }
      
      static void close_fetch_head(struct fetch_head *fetch_head)
    -@@ builtin/fetch.c: static void close_fetch_head(struct fetch_head *fetch_head)
    - 		return;
    - 
    - 	fclose(fetch_head->fp);
    -+	strbuf_release(&fetch_head->buf);
    - }
    - 
    - static const char warn_show_forced_updates[] =
     @@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
      	struct fetch_head fetch_head;
      	struct commit *commit;
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
@ 2021-01-11 11:05   ` Patrick Steinhardt
  2021-01-11 23:16     ` Junio C Hamano
  2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 6533 bytes --]

When performing a fetch with the default `--write-fetch-head` option, we
write all updated references to FETCH_HEAD while the updates are
performed. Given that updates are not performed atomically, it means
that we we write to FETCH_HEAD even if some or all of the reference
updates fail.

Given that we simply update FETCH_HEAD ad-hoc with each reference, the
logic is completely contained in `store_update_refs` and thus quite hard
to extend. This can already be seen by the way we skip writing to the
FETCH_HEAD: instead of having a conditional which simply skips writing,
we instead open "/dev/null" and needlessly write all updates there.

We are about to extend git-fetch(1) to accept an `--atomic` flag which
will make the fetch an all-or-nothing operation with regards to the
reference updates. This will also require us to make the updates to
FETCH_HEAD an all-or-nothing operation, but as explained doing so is not
easy with the current layout. This commit thus refactors the wa we write
to FETCH_HEAD and pulls out the logic to open, append to, commit and
close the file. While this may seem rather over-the top at first,
pulling out this logic will make it a lot easier to update the code in a
subsequent commit. It also allows us to easily skip writing completely
in case `--no-write-fetch-head` was passed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 108 +++++++++++++++++++++++++++++++++++-------------
 remote.h        |   2 +-
 2 files changed, 80 insertions(+), 30 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ecf8537605..50f0306a92 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -897,6 +897,73 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
 	return 0;
 }
 
+struct fetch_head {
+	FILE *fp;
+};
+
+static int open_fetch_head(struct fetch_head *fetch_head)
+{
+	const char *filename = git_path_fetch_head(the_repository);
+
+	if (write_fetch_head) {
+		fetch_head->fp = fopen(filename, "a");
+		if (!fetch_head->fp)
+			return error_errno(_("cannot open %s"), filename);
+	} else {
+		fetch_head->fp = NULL;
+	}
+
+	return 0;
+}
+
+static void append_fetch_head(struct fetch_head *fetch_head,
+			      const struct object_id *old_oid,
+			      enum fetch_head_status fetch_head_status,
+			      const char *note,
+			      const char *url, size_t url_len)
+{
+	char old_oid_hex[GIT_MAX_HEXSZ + 1];
+	const char *merge_status_marker;
+	size_t i;
+
+	if (!fetch_head->fp)
+		return;
+
+	switch (fetch_head_status) {
+		case FETCH_HEAD_NOT_FOR_MERGE:
+			merge_status_marker = "not-for-merge";
+			break;
+		case FETCH_HEAD_MERGE:
+			merge_status_marker = "";
+			break;
+		default:
+			/* do not write anything to FETCH_HEAD */
+			return;
+	}
+
+	fprintf(fetch_head->fp, "%s\t%s\t%s",
+		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
+	for (i = 0; i < url_len; ++i)
+		if ('\n' == url[i])
+			fputs("\\n", fetch_head->fp);
+		else
+			fputc(url[i], fetch_head->fp);
+	fputc('\n', fetch_head->fp);
+}
+
+static void commit_fetch_head(struct fetch_head *fetch_head)
+{
+	/* Nothing to commit yet. */
+}
+
+static void close_fetch_head(struct fetch_head *fetch_head)
+{
+	if (!fetch_head->fp)
+		return;
+
+	fclose(fetch_head->fp);
+}
+
 static const char warn_show_forced_updates[] =
 N_("Fetch normally indicates which branches had a forced update,\n"
    "but that check has been disabled. To re-enable, use '--show-forced-updates'\n"
@@ -909,22 +976,19 @@ N_("It took %.2f seconds to check forced updates. You can use\n"
 static int store_updated_refs(const char *raw_url, const char *remote_name,
 			      int connectivity_checked, struct ref *ref_map)
 {
-	FILE *fp;
+	struct fetch_head fetch_head;
 	struct commit *commit;
 	int url_len, i, rc = 0;
 	struct strbuf note = STRBUF_INIT;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
-	const char *filename = (!write_fetch_head
-				? "/dev/null"
-				: git_path_fetch_head(the_repository));
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
 
-	fp = fopen(filename, "a");
-	if (!fp)
-		return error_errno(_("cannot open %s"), filename);
+	rc = open_fetch_head(&fetch_head);
+	if (rc)
+		return -1;
 
 	if (raw_url)
 		url = transport_anonymize_url(raw_url);
@@ -953,7 +1017,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	     want_status++) {
 		for (rm = ref_map; rm; rm = rm->next) {
 			struct ref *ref = NULL;
-			const char *merge_status_marker = "";
 
 			if (rm->status == REF_STATUS_REJECT_SHALLOW) {
 				if (want_status == FETCH_HEAD_MERGE)
@@ -1011,26 +1074,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 					strbuf_addf(&note, "%s ", kind);
 				strbuf_addf(&note, "'%s' of ", what);
 			}
-			switch (rm->fetch_head_status) {
-			case FETCH_HEAD_NOT_FOR_MERGE:
-				merge_status_marker = "not-for-merge";
-				/* fall-through */
-			case FETCH_HEAD_MERGE:
-				fprintf(fp, "%s\t%s\t%s",
-					oid_to_hex(&rm->old_oid),
-					merge_status_marker,
-					note.buf);
-				for (i = 0; i < url_len; ++i)
-					if ('\n' == url[i])
-						fputs("\\n", fp);
-					else
-						fputc(url[i], fp);
-				fputc('\n', fp);
-				break;
-			default:
-				/* do not write anything to FETCH_HEAD */
-				break;
-			}
+
+			append_fetch_head(&fetch_head, &rm->old_oid,
+					  rm->fetch_head_status,
+					  note.buf, url, url_len);
 
 			strbuf_reset(&note);
 			if (ref) {
@@ -1060,6 +1107,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc)
+		commit_fetch_head(&fetch_head);
+
 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
@@ -1077,7 +1127,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
  abort:
 	strbuf_release(&note);
 	free(url);
-	fclose(fp);
+	close_fetch_head(&fetch_head);
 	return rc;
 }
 
diff --git a/remote.h b/remote.h
index 3211abdf05..aad1a0f080 100644
--- a/remote.h
+++ b/remote.h
@@ -134,7 +134,7 @@ struct ref {
 	 * should be 0, so that xcalloc'd structures get it
 	 * by default.
 	 */
-	enum {
+	enum fetch_head_status {
 		FETCH_HEAD_MERGE = -1,
 		FETCH_HEAD_NOT_FOR_MERGE = 0,
 		FETCH_HEAD_IGNORE = 1
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
  2021-01-11 11:05   ` [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
@ 2021-01-11 11:05   ` Patrick Steinhardt
  2021-01-11 11:10     ` Patrick Steinhardt
                       ` (2 more replies)
  2021-01-11 11:05   ` [PATCH v3 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
                     ` (3 subsequent siblings)
  5 siblings, 3 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2195 bytes --]

This commit refactors `append_fetch_head()` to use a `struct strbuf` for
formatting the update which we're about to append to the FETCH_HEAD
file. While the refactoring doesn't have much of a benefit right now, it
servers as a preparatory step to implement atomic fetches where we need
to buffer all updates to FETCH_HEAD and only flush them out if all
reference updates succeeded.

No change in behaviour is expected from this commit.
---
 builtin/fetch.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 50f0306a92..1252f37493 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -899,6 +899,7 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
 
 struct fetch_head {
 	FILE *fp;
+	struct strbuf buf;
 };
 
 static int open_fetch_head(struct fetch_head *fetch_head)
@@ -909,6 +910,7 @@ static int open_fetch_head(struct fetch_head *fetch_head)
 		fetch_head->fp = fopen(filename, "a");
 		if (!fetch_head->fp)
 			return error_errno(_("cannot open %s"), filename);
+		strbuf_init(&fetch_head->buf, 0);
 	} else {
 		fetch_head->fp = NULL;
 	}
@@ -941,14 +943,17 @@ static void append_fetch_head(struct fetch_head *fetch_head,
 			return;
 	}
 
-	fprintf(fetch_head->fp, "%s\t%s\t%s",
-		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
+	strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
+		    oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
 	for (i = 0; i < url_len; ++i)
 		if ('\n' == url[i])
-			fputs("\\n", fetch_head->fp);
+			strbuf_addstr(&fetch_head->buf, "\\n");
 		else
-			fputc(url[i], fetch_head->fp);
-	fputc('\n', fetch_head->fp);
+			strbuf_addch(&fetch_head->buf, url[i]);
+	strbuf_addch(&fetch_head->buf, '\n');
+
+	strbuf_write(&fetch_head->buf, fetch_head->fp);
+	strbuf_reset(&fetch_head->buf);
 }
 
 static void commit_fetch_head(struct fetch_head *fetch_head)
@@ -962,6 +967,7 @@ static void close_fetch_head(struct fetch_head *fetch_head)
 		return;
 
 	fclose(fetch_head->fp);
+	strbuf_release(&fetch_head->buf);
 }
 
 static const char warn_show_forced_updates[] =
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 3/5] fetch: refactor `s_update_ref` to use common exit path
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
  2021-01-11 11:05   ` [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
  2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
@ 2021-01-11 11:05   ` Patrick Steinhardt
  2021-01-11 11:05   ` [PATCH v3 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

The cleanup code in `s_update_ref()` is currently duplicated for both
succesful and erroneous exit paths. This commit refactors the function
to have a shared exit path for both cases to remove the duplication.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1252f37493..991771f8eb 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -589,7 +589,7 @@ static int s_update_ref(const char *action,
 	char *rla = getenv("GIT_REFLOG_ACTION");
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	int ret, df_conflict = 0;
+	int ret;
 
 	if (dry_run)
 		return 0;
@@ -598,30 +598,37 @@ static int s_update_ref(const char *action,
 	msg = xstrfmt("%s: %s", rla, action);
 
 	transaction = ref_transaction_begin(&err);
-	if (!transaction ||
-	    ref_transaction_update(transaction, ref->name,
-				   &ref->new_oid,
-				   check_old ? &ref->old_oid : NULL,
-				   0, msg, &err))
-		goto fail;
-
-	ret = ref_transaction_commit(transaction, &err);
-	if (ret) {
-		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
-		goto fail;
+	if (!transaction) {
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
 	}
 
+	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
+				     check_old ? &ref->old_oid : NULL,
+				     0, msg, &err);
+	if (ret) {
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
+	}
+
+	switch (ref_transaction_commit(transaction, &err)) {
+	case 0:
+		break;
+	case TRANSACTION_NAME_CONFLICT:
+		ret = STORE_REF_ERROR_DF_CONFLICT;
+		goto out;
+	default:
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
+	}
+
+out:
 	ref_transaction_free(transaction);
+	if (ret)
+		error("%s", err.buf);
 	strbuf_release(&err);
 	free(msg);
-	return 0;
-fail:
-	ref_transaction_free(transaction);
-	error("%s", err.buf);
-	strbuf_release(&err);
-	free(msg);
-	return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
-			   : STORE_REF_ERROR_OTHER;
+	return ret;
 }
 
 static int refcol_width = 10;
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 4/5] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2021-01-11 11:05   ` [PATCH v3 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
@ 2021-01-11 11:05   ` Patrick Steinhardt
  2021-01-11 11:05   ` [PATCH v3 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt
  2021-01-12  0:04   ` [PATCH v3 0/5] " Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 5623 bytes --]

The handling of ref updates is completely handled by `s_update_ref()`,
which will manage the complete lifecycle of the reference transaction.
This is fine right now given that git-fetch(1) does not support atomic
fetches, so each reference gets its own transaction. It is quite
inflexible though, as `s_update_ref()` only knows about a single
reference update at a time, so it doesn't allow us to alter the
strategy.

This commit prepares `s_update_ref()` and its only caller
`update_local_ref()` to allow passing an external transaction. If none
is given, then the existing behaviour is triggered which creates a new
transaction and directly commits it. Otherwise, if the caller provides a
transaction, then we only queue the update but don't commit it. This
optionally allows the caller to manage when a transaction will be
committed.

Given that `update_local_ref()` is always called with a `NULL`
transaction for now, no change in behaviour is expected from this
change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 51 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 991771f8eb..654c7a7eed 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -583,11 +583,12 @@ static struct ref *get_ref_map(struct remote *remote,
 
 static int s_update_ref(const char *action,
 			struct ref *ref,
+			struct ref_transaction *transaction,
 			int check_old)
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *transaction;
+	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -597,10 +598,17 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	transaction = ref_transaction_begin(&err);
+	/*
+	 * If no transaction was passed to us, we manage the transaction
+	 * ourselves. Otherwise, we trust the caller to handle the transaction
+	 * lifecycle.
+	 */
 	if (!transaction) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
+		transaction = our_transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			ret = STORE_REF_ERROR_OTHER;
+			goto out;
+		}
 	}
 
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
@@ -611,19 +619,21 @@ static int s_update_ref(const char *action,
 		goto out;
 	}
 
-	switch (ref_transaction_commit(transaction, &err)) {
-	case 0:
-		break;
-	case TRANSACTION_NAME_CONFLICT:
-		ret = STORE_REF_ERROR_DF_CONFLICT;
-		goto out;
-	default:
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
+	if (our_transaction) {
+		switch (ref_transaction_commit(our_transaction, &err)) {
+		case 0:
+			break;
+		case TRANSACTION_NAME_CONFLICT:
+			ret = STORE_REF_ERROR_DF_CONFLICT;
+			goto out;
+		default:
+			ret = STORE_REF_ERROR_OTHER;
+			goto out;
+		}
 	}
 
 out:
-	ref_transaction_free(transaction);
+	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -766,6 +776,7 @@ static void format_display(struct strbuf *display, char code,
 }
 
 static int update_local_ref(struct ref *ref,
+			    struct ref_transaction *transaction,
 			    const char *remote,
 			    const struct ref *remote_ref,
 			    struct strbuf *display,
@@ -806,7 +817,7 @@ static int update_local_ref(struct ref *ref,
 	    starts_with(ref->name, "refs/tags/")) {
 		if (force || ref->force) {
 			int r;
-			r = s_update_ref("updating tag", ref, 0);
+			r = s_update_ref("updating tag", ref, transaction, 0);
 			format_display(display, r ? '!' : 't', _("[tag update]"),
 				       r ? _("unable to update local ref") : NULL,
 				       remote, pretty_ref, summary_width);
@@ -843,7 +854,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		r = s_update_ref(msg, ref, 0);
+		r = s_update_ref(msg, ref, transaction, 0);
 		format_display(display, r ? '!' : '*', what,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -865,7 +876,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("fast-forward", ref, 1);
+		r = s_update_ref("fast-forward", ref, transaction, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -877,7 +888,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("forced-update", ref, 1);
+		r = s_update_ref("forced-update", ref, transaction, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
 			       r ? _("unable to update local ref") : _("forced update"),
 			       remote, pretty_ref, summary_width);
@@ -1094,8 +1105,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, what, rm, &note,
-						       summary_width);
+				rc |= update_local_ref(ref, NULL, what,
+						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
 				/*
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 5/5] fetch: implement support for atomic reference updates
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2021-01-11 11:05   ` [PATCH v3 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
@ 2021-01-11 11:05   ` Patrick Steinhardt
  2021-01-12  0:04   ` [PATCH v3 0/5] " Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 13728 bytes --]

When executing a fetch, then git will currently allocate one reference
transaction per reference update and directly commit it. This means that
fetches are non-atomic: even if some of the reference updates fail,
others may still succeed and modify local references.

This is fine in many scenarios, but this strategy has its downsides.

- The view of remote references may be inconsistent and may show a
  bastardized state of the remote repository.

- Batching together updates may improve performance in certain
  scenarios. While the impact probably isn't as pronounced with loose
  references, the upcoming reftable backend may benefit as it needs to
  write less files in case the update is batched.

- The reference-update hook is currently being executed twice per
  updated reference. While this doesn't matter when there is no such
  hook, we have seen severe performance regressions when doing a
  git-fetch(1) with reference-transaction hook when the remote
  repository has hundreds of thousands of references.

Similar to `git push --atomic`, this commit thus introduces atomic
fetches. Instead of allocating one reference transaction per updated
reference, it causes us to only allocate a single transaction and commit
it as soon as all updates were received. If locking of any reference
fails, then we abort the complete transaction and don't update any
reference, which gives us an all-or-nothing fetch.

Note that this may not completely fix the first of above downsides, as
the consistent view also depends on the server-side. If the server
doesn't have a consistent view of its own references during the
reference negotiation phase, then the client would get the same
inconsistent view the server has. This is a separate problem though and,
if it actually exists, can be fixed at a later point.

This commit also changes the way we write FETCH_HEAD in case `--atomic`
is passed. Instead of writing changes as we go, we need to accumulate
all changes first and only commit them at the end when we know that all
reference updates succeeded. Ideally, we'd just do so via a temporary
file so that we don't need to carry all updates in-memory. This isn't
trivially doable though considering the `--append` mode, where we do not
truncate the file but simply append to it. And given that we support
concurrent processes appending to FETCH_HEAD at the same time without
any loss of data, seeding the temporary file with current contents of
FETCH_HEAD initially and then doing a rename wouldn't work either. So
this commit implements the simple strategy of buffering all changes and
appending them to the file on commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 |  46 ++++++++-
 t/t5510-fetch.sh                | 168 ++++++++++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 5 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 2bf77b46fd..07783deee3 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -7,6 +7,10 @@
 	existing contents of `.git/FETCH_HEAD`.  Without this
 	option old data in `.git/FETCH_HEAD` will be overwritten.
 
+--atomic::
+	Use an atomic transaction to update local refs. Either all refs are
+	updated, or on error, no refs are updated.
+
 --depth=<depth>::
 	Limit fetching to the specified number of commits from the tip of
 	each remote branch history. If fetching to a 'shallow' repository
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 654c7a7eed..51c9930d06 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -63,6 +63,7 @@ static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
 static int max_jobs = -1, submodule_fetch_jobs_config = -1;
 static int fetch_parallel_config = 1;
+static int atomic_fetch;
 static enum transport_family family;
 static const char *depth;
 static const char *deepen_since;
@@ -144,6 +145,8 @@ static struct option builtin_fetch_options[] = {
 		 N_("set upstream for git pull/fetch")),
 	OPT_BOOL('a', "append", &append,
 		 N_("append to .git/FETCH_HEAD instead of overwriting")),
+	OPT_BOOL(0, "atomic", &atomic_fetch,
+		 N_("use atomic transaction to update references")),
 	OPT_STRING(0, "upload-pack", &upload_pack, N_("path"),
 		   N_("path to upload pack on remote end")),
 	OPT__FORCE(&force, N_("force overwrite of local reference"), 0),
@@ -970,13 +973,23 @@ static void append_fetch_head(struct fetch_head *fetch_head,
 			strbuf_addch(&fetch_head->buf, url[i]);
 	strbuf_addch(&fetch_head->buf, '\n');
 
-	strbuf_write(&fetch_head->buf, fetch_head->fp);
-	strbuf_reset(&fetch_head->buf);
+	/*
+	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
+	 * any of the reference updates fails. We thus have to write all
+	 * updates to a buffer first and only commit it as soon as all
+	 * references have been successfully updated.
+	 */
+	if (!atomic_fetch) {
+		strbuf_write(&fetch_head->buf, fetch_head->fp);
+		strbuf_reset(&fetch_head->buf);
+	}
 }
 
 static void commit_fetch_head(struct fetch_head *fetch_head)
 {
-	/* Nothing to commit yet. */
+	if (!fetch_head->fp || !atomic_fetch)
+		return;
+	strbuf_write(&fetch_head->buf, fetch_head->fp);
 }
 
 static void close_fetch_head(struct fetch_head *fetch_head)
@@ -1003,7 +1016,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	struct fetch_head fetch_head;
 	struct commit *commit;
 	int url_len, i, rc = 0;
-	struct strbuf note = STRBUF_INIT;
+	struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+	struct ref_transaction *transaction = NULL;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
@@ -1029,6 +1043,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (atomic_fetch) {
+		transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	prepare_format_display(ref_map);
 
 	/*
@@ -1105,7 +1127,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, NULL, what,
+				rc |= update_local_ref(ref, transaction, what,
 						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
@@ -1131,6 +1153,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc && transaction) {
+		rc = ref_transaction_commit(transaction, &err);
+		if (rc) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	if (!rc)
 		commit_fetch_head(&fetch_head);
 
@@ -1150,6 +1180,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
  abort:
 	strbuf_release(&note);
+	strbuf_release(&err);
+	ref_transaction_free(transaction);
 	free(url);
 	close_fetch_head(&fetch_head);
 	return rc;
@@ -1961,6 +1993,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			die(_("--filter can only be used with the remote "
 			      "configured in extensions.partialclone"));
 
+		if (atomic_fetch)
+			die(_("--atomic can only be used when fetching "
+			      "from one remote"));
+
 		if (stdin_refspecs)
 			die(_("--stdin can only be used when fetching "
 			      "from one remote"));
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2013051a64..109d15be98 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -176,6 +176,174 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
 	git rev-parse sometag
 '
 
+test_expect_success 'fetch --atomic works with a single branch' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch &&
+	oid=$(git rev-parse atomic-branch) &&
+	echo "$oid" >expected &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse origin/atomic-branch >actual &&
+	test_cmp expected actual &&
+	test $oid = "$(git -C atomic rev-parse --verify FETCH_HEAD)"
+'
+
+test_expect_success 'fetch --atomic works with multiple branches' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch-1 &&
+	git branch atomic-branch-2 &&
+	git branch atomic-branch-3 &&
+	git rev-parse refs/heads/atomic-branch-1 refs/heads/atomic-branch-2 refs/heads/atomic-branch-3 >actual &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-branch-1 refs/remotes/origin/atomic-branch-2 refs/remotes/origin/atomic-branch-3 >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic works with mixed branches and tags' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-mixed-branch &&
+	git tag atomic-mixed-tag &&
+	git rev-parse refs/heads/atomic-mixed-branch refs/tags/atomic-mixed-tag >actual &&
+
+	git -C atomic fetch --tags --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-mixed-branch refs/tags/atomic-mixed-tag >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic prunes references' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-prune-delete &&
+	git clone . atomic &&
+	git branch --delete atomic-prune-delete &&
+	git branch atomic-prune-create &&
+	git rev-parse refs/heads/atomic-prune-create >actual &&
+
+	git -C atomic fetch --prune --atomic origin &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-prune-delete &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-prune-create >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic aborts with non-fast-forward update' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-non-ff &&
+	git clone . atomic &&
+	git rev-parse HEAD >actual &&
+
+	git branch atomic-new-branch &&
+	parent_commit=$(git rev-parse atomic-non-ff~) &&
+	git update-ref refs/heads/atomic-non-ff $parent_commit &&
+
+	test_must_fail git -C atomic fetch --atomic origin refs/heads/*:refs/remotes/origin/* &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-new-branch &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-non-ff >expected &&
+	test_cmp expected actual &&
+	test_must_be_empty atomic/.git/FETCH_HEAD
+'
+
+test_expect_success 'fetch --atomic executes a single reference transaction only' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-1 &&
+	git branch atomic-hooks-2 &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+		committed
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+	EOF
+
+	git -C atomic fetch --atomic origin &&
+	test_cmp expected atomic/actual
+'
+
+test_expect_success 'fetch --atomic aborts all reference updates if hook aborts' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-abort-1 &&
+	git branch atomic-hooks-abort-2 &&
+	git branch atomic-hooks-abort-3 &&
+	git tag atomic-hooks-abort &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+		aborted
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+		exit 1
+	EOF
+
+	git -C atomic for-each-ref >expected-refs &&
+	test_must_fail git -C atomic fetch --tags --atomic origin &&
+	git -C atomic for-each-ref >actual-refs &&
+	test_cmp expected-refs actual-refs &&
+	test_must_be_empty atomic/.git/FETCH_HEAD
+'
+
+test_expect_success 'fetch --atomic --append appends to FETCH_HEAD' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	oid=$(git rev-parse HEAD) &&
+
+	git branch atomic-fetch-head-1 &&
+	git -C atomic fetch --atomic origin atomic-fetch-head-1 &&
+	test_line_count = 1 atomic/.git/FETCH_HEAD &&
+
+	git branch atomic-fetch-head-2 &&
+	git -C atomic fetch --atomic --append origin atomic-fetch-head-2 &&
+	test_line_count = 2 atomic/.git/FETCH_HEAD &&
+	cp atomic/.git/FETCH_HEAD expected &&
+
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		exit 1
+	EOF
+
+	git branch atomic-fetch-head-3 &&
+	test_must_fail git -C atomic fetch --atomic --append origin atomic-fetch-head-3 &&
+	test_cmp expected atomic/.git/FETCH_HEAD
+'
+
 test_expect_success '--refmap="" ignores configured refspec' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone "$D" remote-refs &&
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates
  2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
@ 2021-01-11 11:10     ` Patrick Steinhardt
  2021-01-11 11:17     ` Christian Couder
  2021-01-11 23:21     ` Junio C Hamano
  2 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-11 11:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On Mon, Jan 11, 2021 at 12:05:20PM +0100, Patrick Steinhardt wrote:
> This commit refactors `append_fetch_head()` to use a `struct strbuf` for
> formatting the update which we're about to append to the FETCH_HEAD
> file. While the refactoring doesn't have much of a benefit right now, it
> servers as a preparatory step to implement atomic fetches where we need
> to buffer all updates to FETCH_HEAD and only flush them out if all
> reference updates succeeded.
> 
> No change in behaviour is expected from this commit.

Forgot to add my

Signed-off-by: Patrick Steinhardt <ps@pks.im>

Will amend in v4.

Patrick

> ---
>  builtin/fetch.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 50f0306a92..1252f37493 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -899,6 +899,7 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
>  
>  struct fetch_head {
>  	FILE *fp;
> +	struct strbuf buf;
>  };
>  
>  static int open_fetch_head(struct fetch_head *fetch_head)
> @@ -909,6 +910,7 @@ static int open_fetch_head(struct fetch_head *fetch_head)
>  		fetch_head->fp = fopen(filename, "a");
>  		if (!fetch_head->fp)
>  			return error_errno(_("cannot open %s"), filename);
> +		strbuf_init(&fetch_head->buf, 0);
>  	} else {
>  		fetch_head->fp = NULL;
>  	}
> @@ -941,14 +943,17 @@ static void append_fetch_head(struct fetch_head *fetch_head,
>  			return;
>  	}
>  
> -	fprintf(fetch_head->fp, "%s\t%s\t%s",
> -		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
> +	strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
> +		    oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
>  	for (i = 0; i < url_len; ++i)
>  		if ('\n' == url[i])
> -			fputs("\\n", fetch_head->fp);
> +			strbuf_addstr(&fetch_head->buf, "\\n");
>  		else
> -			fputc(url[i], fetch_head->fp);
> -	fputc('\n', fetch_head->fp);
> +			strbuf_addch(&fetch_head->buf, url[i]);
> +	strbuf_addch(&fetch_head->buf, '\n');
> +
> +	strbuf_write(&fetch_head->buf, fetch_head->fp);
> +	strbuf_reset(&fetch_head->buf);
>  }
>  
>  static void commit_fetch_head(struct fetch_head *fetch_head)
> @@ -962,6 +967,7 @@ static void close_fetch_head(struct fetch_head *fetch_head)
>  		return;
>  
>  	fclose(fetch_head->fp);
> +	strbuf_release(&fetch_head->buf);
>  }
>  
>  static const char warn_show_forced_updates[] =
> -- 
> 2.30.0
> 



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates
  2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
  2021-01-11 11:10     ` Patrick Steinhardt
@ 2021-01-11 11:17     ` Christian Couder
  2021-01-11 23:21     ` Junio C Hamano
  2 siblings, 0 replies; 41+ messages in thread
From: Christian Couder @ 2021-01-11 11:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

On Mon, Jan 11, 2021 at 12:05 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> This commit refactors `append_fetch_head()` to use a `struct strbuf` for
> formatting the update which we're about to append to the FETCH_HEAD
> file. While the refactoring doesn't have much of a benefit right now, it
> servers as a preparatory step to implement atomic fetches where we need

s/servers/serves/

> to buffer all updates to FETCH_HEAD and only flush them out if all
> reference updates succeeded.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD
  2021-01-11 10:26       ` Patrick Steinhardt
@ 2021-01-11 19:24         ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-11 19:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

>> As an abstraction, it may be better to make the caller pass a
>> boolean "is this for merge?" and keep the knowledge of what exact
>> string is used for merge_status_marker to this function, instead of
>> letting the caller passing it as a parameter in the string form.
>> After all, we never allow anything other than an empty string or a
>> fixed "not-for-merge" string in that place in the file format.
>> [Nit #2]
>
> I think it's even nicer to just pass in `rm->fetch_head_status`
> directly, which allows us to move below switch into `append_fetch_head`.

OK.  That may even be better.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-11 10:42       ` Patrick Steinhardt
@ 2021-01-11 19:47         ` Junio C Hamano
  2021-01-12 12:22           ` Patrick Steinhardt
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2021-01-11 19:47 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

>> It would be much more preferrable to see this series go like so:
>> 
>>     [1/4] create append_fetch_head() that writes out to
>>           fetch_head->fp
>> 
>>     [1.5/4] convert append_fetch_head() to ALWAYS accumulate in
>>             fetch_head->buf, and ALWAYS flushes what is accumulated
>>             at the end.
>
> This is a change I'm hesitant to make. The thing is that FETCH_HEAD is
> quite weird as the current design allows different `git fetch --append`
> processes to write to FETCH_HEAD at the same time.

Hmph, do you mean 

	git fetch --append remote-a & git fetch --append remote-b

or something else [*1*]?

With the current write-out codepath, there is no guarantee that ...

> > +		fprintf(fetch_head->fp, "%s\t%s\t%s",
> > +			old_oid, merge_status_marker, note);
> > +		for (i = 0; i < url_len; ++i)
> > +			if ('\n' == url[i])
> > +				fputs("\\n", fetch_head->fp);
> > +			else
> > +				fputc(url[i], fetch_head->fp);
> > +		fputc('\n', fetch_head->fp);

... these stdio operations for a single record would result in a
single atomic write() that would not compete with writes by other
processes.  So I wouldn't call "the current design allows ... at the
same time."  It has been broken for years and nobody complained.

> If we change to
> always accumulate first and flush once we're done, then we essentially
> have a change in behaviour as FETCH_HEADs aren't written in an
> interleaving fashion anymore, but only at the end.

I view it a good thing, actually, for another reason [*2*], but that
would take a follow-up fix that is much more involved, so let's not
go there (yet).  At least buffering a single line's worth of output
in a buffer and writing it out in a single write() would get us much
closer to solving the "mixed up output" from multiple processes
problem the current code seems to already have.

> Also, if there is any
> unexpected error in between updating references which causes us to die,
> then we wouldn't have written the already updated references to
> FETCH_HEAD now.

That one may be a valid concern.

Thanks.


[Footnote]

*1* "git fetch --all/--multiple" was strictly serial operation to
    avoid such a mixed-up output problem, but perhaps we weren't
    careful enough when we introduced parallel fetch and broke it?

*2* When fetching from a single remote, the code makes sure that a
    record that is not marked as 'not-for-merge' is listed first by
    reordering the records, but it does not work well when fetching
    from multiple remotes.  Queuing all in the buffer before showing
    them would give us an opportunity to sort, but as I said, it
    takes coordination among the processes---instead of each process
    writing to FETCH_HEAD on their own, somebody has to collect the
    records from them, reorder as needed and write them all out.

    cf. https://lore.kernel.org/git/X8oL190Vl03B0cQ%2F@coredump.intra.peff.net/

    

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD
  2021-01-11 11:05   ` [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
@ 2021-01-11 23:16     ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-11 23:16 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> +	switch (fetch_head_status) {
> +		case FETCH_HEAD_NOT_FOR_MERGE:

In our codebase, "switch", "case" and "default:" all begin at the
same column (see the original this was taken from).

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates
  2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
  2021-01-11 11:10     ` Patrick Steinhardt
  2021-01-11 11:17     ` Christian Couder
@ 2021-01-11 23:21     ` Junio C Hamano
  2 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-11 23:21 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> @@ -909,6 +910,7 @@ static int open_fetch_head(struct fetch_head *fetch_head)
>  		fetch_head->fp = fopen(filename, "a");
>  		if (!fetch_head->fp)
>  			return error_errno(_("cannot open %s"), filename);
> +		strbuf_init(&fetch_head->buf, 0);
>  	} else {
>  		fetch_head->fp = NULL;
>  	}

Leaving fetch_head->buf uninitialized is probably OK as the caller
of open_fetch_head() would (or at least should) immediately barf
upon seeing an error return?

Ah, no.  Under dry-run mode, we will return success but leave buf
uninitializesd.  It is safe because we do not use the strbuf when fp
is NULL.  OK.

> @@ -941,14 +943,17 @@ static void append_fetch_head(struct fetch_head *fetch_head,
>  			return;
>  	}
>  
> -	fprintf(fetch_head->fp, "%s\t%s\t%s",
> -		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
> +	strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
> +		    oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
>  	for (i = 0; i < url_len; ++i)
>  		if ('\n' == url[i])
> -			fputs("\\n", fetch_head->fp);
> +			strbuf_addstr(&fetch_head->buf, "\\n");
>  		else
> -			fputc(url[i], fetch_head->fp);
> -	fputc('\n', fetch_head->fp);
> +			strbuf_addch(&fetch_head->buf, url[i]);
> +	strbuf_addch(&fetch_head->buf, '\n');
> +
> +	strbuf_write(&fetch_head->buf, fetch_head->fp);
> +	strbuf_reset(&fetch_head->buf);

This gets us closer to fixing the "one record can be written out
with multiple write(2) calls, allowing parallel fetches to corrupt
FETCH_HEAD file by intermixed records" problem (even though this
change alone would not solve it---for that we need to do write
ourselves, not letting stdio do the flushing).

>  }
>  
>  static void commit_fetch_head(struct fetch_head *fetch_head)
> @@ -962,6 +967,7 @@ static void close_fetch_head(struct fetch_head *fetch_head)
>  		return;
>  
>  	fclose(fetch_head->fp);
> +	strbuf_release(&fetch_head->buf);
>  }
>  
>  static const char warn_show_forced_updates[] =

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/5] fetch: implement support for atomic reference updates
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2021-01-11 11:05   ` [PATCH v3 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt
@ 2021-01-12  0:04   ` Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-12  0:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

Patrick Steinhardt <ps@pks.im> writes:

> Changes compared to v2:
>
>     - `append_fetch_head` now takes a `struct object_id *old_oid`
>       instead of a `char *old_oid`.
>
>     - Handling of the merge status marker was moved into
>       `append_fetch_head`.
>
>     - I've unified code paths to format the line we're about to write to
>       FETCH_HEAD in atomic and non-atomic cases, which is the new 2/5
>       patch.
>
>     - We now always initialize `fetch_head->fp` and use it to derive
>       whether anything should be written at all instead of using the
>       global `write_fetch_head` variable.
>
> I think this should address all of Junio's feedback. Thanks for your
> input!

Looking good.  I left a few comments on remaining or newly
introduced issues, but IIRC all of them were rather minor.

Thanks, will replace.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-11 19:47         ` Junio C Hamano
@ 2021-01-12 12:22           ` Patrick Steinhardt
  2021-01-12 12:29             ` Patrick Steinhardt
  2021-01-12 19:19             ` Junio C Hamano
  0 siblings, 2 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jeff King

[-- Attachment #1: Type: text/plain, Size: 3858 bytes --]

On Mon, Jan 11, 2021 at 11:47:08AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> It would be much more preferrable to see this series go like so:
> >> 
> >>     [1/4] create append_fetch_head() that writes out to
> >>           fetch_head->fp
> >> 
> >>     [1.5/4] convert append_fetch_head() to ALWAYS accumulate in
> >>             fetch_head->buf, and ALWAYS flushes what is accumulated
> >>             at the end.
> >
> > This is a change I'm hesitant to make. The thing is that FETCH_HEAD is
> > quite weird as the current design allows different `git fetch --append`
> > processes to write to FETCH_HEAD at the same time.
> 
> Hmph, do you mean 
> 
> 	git fetch --append remote-a & git fetch --append remote-b
> 
> or something else [*1*]?

That's what I mean.

> With the current write-out codepath, there is no guarantee that ...
> 
> > > +		fprintf(fetch_head->fp, "%s\t%s\t%s",
> > > +			old_oid, merge_status_marker, note);
> > > +		for (i = 0; i < url_len; ++i)
> > > +			if ('\n' == url[i])
> > > +				fputs("\\n", fetch_head->fp);
> > > +			else
> > > +				fputc(url[i], fetch_head->fp);
> > > +		fputc('\n', fetch_head->fp);
> 
> ... these stdio operations for a single record would result in a
> single atomic write() that would not compete with writes by other
> processes.  So I wouldn't call "the current design allows ... at the
> same time."  It has been broken for years and nobody complained.

Fair enough. I somehow blindly assumed that `git fetch --all`, which
does invoke `git fetch --append`, did perform the fetch in parallel. If
that's not the case: all the better.

> > If we change to
> > always accumulate first and flush once we're done, then we essentially
> > have a change in behaviour as FETCH_HEADs aren't written in an
> > interleaving fashion anymore, but only at the end.
> 
> I view it a good thing, actually, for another reason [*2*], but that
> would take a follow-up fix that is much more involved, so let's not
> go there (yet).  At least buffering a single line's worth of output
> in a buffer and writing it out in a single write() would get us much
> closer to solving the "mixed up output" from multiple processes
> problem the current code seems to already have.

The buffering of a single line is exactly what we're doing now in the
non-atomic case. Previously there had been multiple writes, now we only
call `strbuf_write()` once on the buffer for each reference.

> > Also, if there is any
> > unexpected error in between updating references which causes us to die,
> > then we wouldn't have written the already updated references to
> > FETCH_HEAD now.
> 
> That one may be a valid concern.
> 
> Thanks.

Just to be explicit: are you fine with changes in this patch as-is? They
don't solve concurrent writes to FETCH_HEAD, but they should make it
easier to solve that if we ever need to.

Patrick

> 
> [Footnote]
> 
> *1* "git fetch --all/--multiple" was strictly serial operation to
>     avoid such a mixed-up output problem, but perhaps we weren't
>     careful enough when we introduced parallel fetch and broke it?
> 
> *2* When fetching from a single remote, the code makes sure that a
>     record that is not marked as 'not-for-merge' is listed first by
>     reordering the records, but it does not work well when fetching
>     from multiple remotes.  Queuing all in the buffer before showing
>     them would give us an opportunity to sort, but as I said, it
>     takes coordination among the processes---instead of each process
>     writing to FETCH_HEAD on their own, somebody has to collect the
>     records from them, reorder as needed and write them all out.
> 
>     cf. https://lore.kernel.org/git/X8oL190Vl03B0cQ%2F@coredump.intra.peff.net/
> 
>     

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 0/5] fetch: implement support for atomic reference updates
  2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
@ 2021-01-12 12:27 ` Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
                     ` (4 more replies)
  4 siblings, 5 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3457 bytes --]

Hi,

this is the fourth version of my patch series to implement support for
atomic reference updates for git-fetch(1). It's similar to `git push
--atomic`, only that it applies to the local side. That is the fetch
will either succeed and update all remote references or it will fail and
update none.

Changes compared to v3:

    - Fixed indentation of the switch statement in 1/5.

    - Added my missing SOB to 2/5 and fixed a typo in the commit
      message.

Please see the attached range-diff for more details.

Patrick


Patrick Steinhardt (5):
  fetch: extract writing to FETCH_HEAD
  fetch: use strbuf to format FETCH_HEAD updates
  fetch: refactor `s_update_ref` to use common exit path
  fetch: allow passing a transaction to `s_update_ref()`
  fetch: implement support for atomic reference updates

 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 | 228 +++++++++++++++++++++++---------
 remote.h                        |   2 +-
 t/t5510-fetch.sh                | 168 +++++++++++++++++++++++
 4 files changed, 342 insertions(+), 60 deletions(-)

Range-diff against v3:
1:  61dc19a1ca ! 1:  9fcc8b54de fetch: extract writing to FETCH_HEAD
    @@ builtin/fetch.c: static int iterate_ref_map(void *cb_data, struct object_id *oid
     +		return;
     +
     +	switch (fetch_head_status) {
    -+		case FETCH_HEAD_NOT_FOR_MERGE:
    -+			merge_status_marker = "not-for-merge";
    -+			break;
    -+		case FETCH_HEAD_MERGE:
    -+			merge_status_marker = "";
    -+			break;
    -+		default:
    -+			/* do not write anything to FETCH_HEAD */
    -+			return;
    ++	case FETCH_HEAD_NOT_FOR_MERGE:
    ++		merge_status_marker = "not-for-merge";
    ++		break;
    ++	case FETCH_HEAD_MERGE:
    ++		merge_status_marker = "";
    ++		break;
    ++	default:
    ++		/* do not write anything to FETCH_HEAD */
    ++		return;
     +	}
     +
     +	fprintf(fetch_head->fp, "%s\t%s\t%s",
2:  a19762690e ! 2:  fb8542270a fetch: use strbuf to format FETCH_HEAD updates
    @@ Commit message
         This commit refactors `append_fetch_head()` to use a `struct strbuf` for
         formatting the update which we're about to append to the FETCH_HEAD
         file. While the refactoring doesn't have much of a benefit right now, it
    -    servers as a preparatory step to implement atomic fetches where we need
    +    serves as a preparatory step to implement atomic fetches where we need
         to buffer all updates to FETCH_HEAD and only flush them out if all
         reference updates succeeded.
     
         No change in behaviour is expected from this commit.
     
    +    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    +
      ## builtin/fetch.c ##
     @@ builtin/fetch.c: static int iterate_ref_map(void *cb_data, struct object_id *oid)
      
    @@ builtin/fetch.c: static int open_fetch_head(struct fetch_head *fetch_head)
      		fetch_head->fp = NULL;
      	}
     @@ builtin/fetch.c: static void append_fetch_head(struct fetch_head *fetch_head,
    - 			return;
    + 		return;
      	}
      
     -	fprintf(fetch_head->fp, "%s\t%s\t%s",
3:  c411f30e09 = 3:  ba6908aa8c fetch: refactor `s_update_ref` to use common exit path
4:  865d357ba7 = 4:  7f820f6f83 fetch: allow passing a transaction to `s_update_ref()`
5:  6a79e7adcc = 5:  0b57d7a651 fetch: implement support for atomic reference updates
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 1/5] fetch: extract writing to FETCH_HEAD
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
@ 2021-01-12 12:27   ` Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 6524 bytes --]

When performing a fetch with the default `--write-fetch-head` option, we
write all updated references to FETCH_HEAD while the updates are
performed. Given that updates are not performed atomically, it means
that we we write to FETCH_HEAD even if some or all of the reference
updates fail.

Given that we simply update FETCH_HEAD ad-hoc with each reference, the
logic is completely contained in `store_update_refs` and thus quite hard
to extend. This can already be seen by the way we skip writing to the
FETCH_HEAD: instead of having a conditional which simply skips writing,
we instead open "/dev/null" and needlessly write all updates there.

We are about to extend git-fetch(1) to accept an `--atomic` flag which
will make the fetch an all-or-nothing operation with regards to the
reference updates. This will also require us to make the updates to
FETCH_HEAD an all-or-nothing operation, but as explained doing so is not
easy with the current layout. This commit thus refactors the wa we write
to FETCH_HEAD and pulls out the logic to open, append to, commit and
close the file. While this may seem rather over-the top at first,
pulling out this logic will make it a lot easier to update the code in a
subsequent commit. It also allows us to easily skip writing completely
in case `--no-write-fetch-head` was passed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 108 +++++++++++++++++++++++++++++++++++-------------
 remote.h        |   2 +-
 2 files changed, 80 insertions(+), 30 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ecf8537605..2dd5fcb652 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -897,6 +897,73 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
 	return 0;
 }
 
+struct fetch_head {
+	FILE *fp;
+};
+
+static int open_fetch_head(struct fetch_head *fetch_head)
+{
+	const char *filename = git_path_fetch_head(the_repository);
+
+	if (write_fetch_head) {
+		fetch_head->fp = fopen(filename, "a");
+		if (!fetch_head->fp)
+			return error_errno(_("cannot open %s"), filename);
+	} else {
+		fetch_head->fp = NULL;
+	}
+
+	return 0;
+}
+
+static void append_fetch_head(struct fetch_head *fetch_head,
+			      const struct object_id *old_oid,
+			      enum fetch_head_status fetch_head_status,
+			      const char *note,
+			      const char *url, size_t url_len)
+{
+	char old_oid_hex[GIT_MAX_HEXSZ + 1];
+	const char *merge_status_marker;
+	size_t i;
+
+	if (!fetch_head->fp)
+		return;
+
+	switch (fetch_head_status) {
+	case FETCH_HEAD_NOT_FOR_MERGE:
+		merge_status_marker = "not-for-merge";
+		break;
+	case FETCH_HEAD_MERGE:
+		merge_status_marker = "";
+		break;
+	default:
+		/* do not write anything to FETCH_HEAD */
+		return;
+	}
+
+	fprintf(fetch_head->fp, "%s\t%s\t%s",
+		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
+	for (i = 0; i < url_len; ++i)
+		if ('\n' == url[i])
+			fputs("\\n", fetch_head->fp);
+		else
+			fputc(url[i], fetch_head->fp);
+	fputc('\n', fetch_head->fp);
+}
+
+static void commit_fetch_head(struct fetch_head *fetch_head)
+{
+	/* Nothing to commit yet. */
+}
+
+static void close_fetch_head(struct fetch_head *fetch_head)
+{
+	if (!fetch_head->fp)
+		return;
+
+	fclose(fetch_head->fp);
+}
+
 static const char warn_show_forced_updates[] =
 N_("Fetch normally indicates which branches had a forced update,\n"
    "but that check has been disabled. To re-enable, use '--show-forced-updates'\n"
@@ -909,22 +976,19 @@ N_("It took %.2f seconds to check forced updates. You can use\n"
 static int store_updated_refs(const char *raw_url, const char *remote_name,
 			      int connectivity_checked, struct ref *ref_map)
 {
-	FILE *fp;
+	struct fetch_head fetch_head;
 	struct commit *commit;
 	int url_len, i, rc = 0;
 	struct strbuf note = STRBUF_INIT;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
-	const char *filename = (!write_fetch_head
-				? "/dev/null"
-				: git_path_fetch_head(the_repository));
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
 
-	fp = fopen(filename, "a");
-	if (!fp)
-		return error_errno(_("cannot open %s"), filename);
+	rc = open_fetch_head(&fetch_head);
+	if (rc)
+		return -1;
 
 	if (raw_url)
 		url = transport_anonymize_url(raw_url);
@@ -953,7 +1017,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	     want_status++) {
 		for (rm = ref_map; rm; rm = rm->next) {
 			struct ref *ref = NULL;
-			const char *merge_status_marker = "";
 
 			if (rm->status == REF_STATUS_REJECT_SHALLOW) {
 				if (want_status == FETCH_HEAD_MERGE)
@@ -1011,26 +1074,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 					strbuf_addf(&note, "%s ", kind);
 				strbuf_addf(&note, "'%s' of ", what);
 			}
-			switch (rm->fetch_head_status) {
-			case FETCH_HEAD_NOT_FOR_MERGE:
-				merge_status_marker = "not-for-merge";
-				/* fall-through */
-			case FETCH_HEAD_MERGE:
-				fprintf(fp, "%s\t%s\t%s",
-					oid_to_hex(&rm->old_oid),
-					merge_status_marker,
-					note.buf);
-				for (i = 0; i < url_len; ++i)
-					if ('\n' == url[i])
-						fputs("\\n", fp);
-					else
-						fputc(url[i], fp);
-				fputc('\n', fp);
-				break;
-			default:
-				/* do not write anything to FETCH_HEAD */
-				break;
-			}
+
+			append_fetch_head(&fetch_head, &rm->old_oid,
+					  rm->fetch_head_status,
+					  note.buf, url, url_len);
 
 			strbuf_reset(&note);
 			if (ref) {
@@ -1060,6 +1107,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc)
+		commit_fetch_head(&fetch_head);
+
 	if (rc & STORE_REF_ERROR_DF_CONFLICT)
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
@@ -1077,7 +1127,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
  abort:
 	strbuf_release(&note);
 	free(url);
-	fclose(fp);
+	close_fetch_head(&fetch_head);
 	return rc;
 }
 
diff --git a/remote.h b/remote.h
index 3211abdf05..aad1a0f080 100644
--- a/remote.h
+++ b/remote.h
@@ -134,7 +134,7 @@ struct ref {
 	 * should be 0, so that xcalloc'd structures get it
 	 * by default.
 	 */
-	enum {
+	enum fetch_head_status {
 		FETCH_HEAD_MERGE = -1,
 		FETCH_HEAD_NOT_FOR_MERGE = 0,
 		FETCH_HEAD_IGNORE = 1
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 2/5] fetch: use strbuf to format FETCH_HEAD updates
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
@ 2021-01-12 12:27   ` Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

This commit refactors `append_fetch_head()` to use a `struct strbuf` for
formatting the update which we're about to append to the FETCH_HEAD
file. While the refactoring doesn't have much of a benefit right now, it
serves as a preparatory step to implement atomic fetches where we need
to buffer all updates to FETCH_HEAD and only flush them out if all
reference updates succeeded.

No change in behaviour is expected from this commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2dd5fcb652..e317e828cd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -899,6 +899,7 @@ static int iterate_ref_map(void *cb_data, struct object_id *oid)
 
 struct fetch_head {
 	FILE *fp;
+	struct strbuf buf;
 };
 
 static int open_fetch_head(struct fetch_head *fetch_head)
@@ -909,6 +910,7 @@ static int open_fetch_head(struct fetch_head *fetch_head)
 		fetch_head->fp = fopen(filename, "a");
 		if (!fetch_head->fp)
 			return error_errno(_("cannot open %s"), filename);
+		strbuf_init(&fetch_head->buf, 0);
 	} else {
 		fetch_head->fp = NULL;
 	}
@@ -941,14 +943,17 @@ static void append_fetch_head(struct fetch_head *fetch_head,
 		return;
 	}
 
-	fprintf(fetch_head->fp, "%s\t%s\t%s",
-		oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
+	strbuf_addf(&fetch_head->buf, "%s\t%s\t%s",
+		    oid_to_hex_r(old_oid_hex, old_oid), merge_status_marker, note);
 	for (i = 0; i < url_len; ++i)
 		if ('\n' == url[i])
-			fputs("\\n", fetch_head->fp);
+			strbuf_addstr(&fetch_head->buf, "\\n");
 		else
-			fputc(url[i], fetch_head->fp);
-	fputc('\n', fetch_head->fp);
+			strbuf_addch(&fetch_head->buf, url[i]);
+	strbuf_addch(&fetch_head->buf, '\n');
+
+	strbuf_write(&fetch_head->buf, fetch_head->fp);
+	strbuf_reset(&fetch_head->buf);
 }
 
 static void commit_fetch_head(struct fetch_head *fetch_head)
@@ -962,6 +967,7 @@ static void close_fetch_head(struct fetch_head *fetch_head)
 		return;
 
 	fclose(fetch_head->fp);
+	strbuf_release(&fetch_head->buf);
 }
 
 static const char warn_show_forced_updates[] =
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 3/5] fetch: refactor `s_update_ref` to use common exit path
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
@ 2021-01-12 12:27   ` Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt
  4 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

The cleanup code in `s_update_ref()` is currently duplicated for both
succesful and erroneous exit paths. This commit refactors the function
to have a shared exit path for both cases to remove the duplication.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e317e828cd..b24a9e09a4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -589,7 +589,7 @@ static int s_update_ref(const char *action,
 	char *rla = getenv("GIT_REFLOG_ACTION");
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
-	int ret, df_conflict = 0;
+	int ret;
 
 	if (dry_run)
 		return 0;
@@ -598,30 +598,37 @@ static int s_update_ref(const char *action,
 	msg = xstrfmt("%s: %s", rla, action);
 
 	transaction = ref_transaction_begin(&err);
-	if (!transaction ||
-	    ref_transaction_update(transaction, ref->name,
-				   &ref->new_oid,
-				   check_old ? &ref->old_oid : NULL,
-				   0, msg, &err))
-		goto fail;
-
-	ret = ref_transaction_commit(transaction, &err);
-	if (ret) {
-		df_conflict = (ret == TRANSACTION_NAME_CONFLICT);
-		goto fail;
+	if (!transaction) {
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
 	}
 
+	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
+				     check_old ? &ref->old_oid : NULL,
+				     0, msg, &err);
+	if (ret) {
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
+	}
+
+	switch (ref_transaction_commit(transaction, &err)) {
+	case 0:
+		break;
+	case TRANSACTION_NAME_CONFLICT:
+		ret = STORE_REF_ERROR_DF_CONFLICT;
+		goto out;
+	default:
+		ret = STORE_REF_ERROR_OTHER;
+		goto out;
+	}
+
+out:
 	ref_transaction_free(transaction);
+	if (ret)
+		error("%s", err.buf);
 	strbuf_release(&err);
 	free(msg);
-	return 0;
-fail:
-	ref_transaction_free(transaction);
-	error("%s", err.buf);
-	strbuf_release(&err);
-	free(msg);
-	return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
-			   : STORE_REF_ERROR_OTHER;
+	return ret;
 }
 
 static int refcol_width = 10;
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 4/5] fetch: allow passing a transaction to `s_update_ref()`
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2021-01-12 12:27   ` [PATCH v4 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
@ 2021-01-12 12:27   ` Patrick Steinhardt
  2021-01-12 12:27   ` [PATCH v4 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt
  4 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 5623 bytes --]

The handling of ref updates is completely handled by `s_update_ref()`,
which will manage the complete lifecycle of the reference transaction.
This is fine right now given that git-fetch(1) does not support atomic
fetches, so each reference gets its own transaction. It is quite
inflexible though, as `s_update_ref()` only knows about a single
reference update at a time, so it doesn't allow us to alter the
strategy.

This commit prepares `s_update_ref()` and its only caller
`update_local_ref()` to allow passing an external transaction. If none
is given, then the existing behaviour is triggered which creates a new
transaction and directly commits it. Otherwise, if the caller provides a
transaction, then we only queue the update but don't commit it. This
optionally allows the caller to manage when a transaction will be
committed.

Given that `update_local_ref()` is always called with a `NULL`
transaction for now, no change in behaviour is expected from this
change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 51 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b24a9e09a4..cada732325 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -583,11 +583,12 @@ static struct ref *get_ref_map(struct remote *remote,
 
 static int s_update_ref(const char *action,
 			struct ref *ref,
+			struct ref_transaction *transaction,
 			int check_old)
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *transaction;
+	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -597,10 +598,17 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	transaction = ref_transaction_begin(&err);
+	/*
+	 * If no transaction was passed to us, we manage the transaction
+	 * ourselves. Otherwise, we trust the caller to handle the transaction
+	 * lifecycle.
+	 */
 	if (!transaction) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
+		transaction = our_transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			ret = STORE_REF_ERROR_OTHER;
+			goto out;
+		}
 	}
 
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
@@ -611,19 +619,21 @@ static int s_update_ref(const char *action,
 		goto out;
 	}
 
-	switch (ref_transaction_commit(transaction, &err)) {
-	case 0:
-		break;
-	case TRANSACTION_NAME_CONFLICT:
-		ret = STORE_REF_ERROR_DF_CONFLICT;
-		goto out;
-	default:
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
+	if (our_transaction) {
+		switch (ref_transaction_commit(our_transaction, &err)) {
+		case 0:
+			break;
+		case TRANSACTION_NAME_CONFLICT:
+			ret = STORE_REF_ERROR_DF_CONFLICT;
+			goto out;
+		default:
+			ret = STORE_REF_ERROR_OTHER;
+			goto out;
+		}
 	}
 
 out:
-	ref_transaction_free(transaction);
+	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -766,6 +776,7 @@ static void format_display(struct strbuf *display, char code,
 }
 
 static int update_local_ref(struct ref *ref,
+			    struct ref_transaction *transaction,
 			    const char *remote,
 			    const struct ref *remote_ref,
 			    struct strbuf *display,
@@ -806,7 +817,7 @@ static int update_local_ref(struct ref *ref,
 	    starts_with(ref->name, "refs/tags/")) {
 		if (force || ref->force) {
 			int r;
-			r = s_update_ref("updating tag", ref, 0);
+			r = s_update_ref("updating tag", ref, transaction, 0);
 			format_display(display, r ? '!' : 't', _("[tag update]"),
 				       r ? _("unable to update local ref") : NULL,
 				       remote, pretty_ref, summary_width);
@@ -843,7 +854,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		r = s_update_ref(msg, ref, 0);
+		r = s_update_ref(msg, ref, transaction, 0);
 		format_display(display, r ? '!' : '*', what,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -865,7 +876,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("fast-forward", ref, 1);
+		r = s_update_ref("fast-forward", ref, transaction, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
 			       r ? _("unable to update local ref") : NULL,
 			       remote, pretty_ref, summary_width);
@@ -877,7 +888,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		r = s_update_ref("forced-update", ref, 1);
+		r = s_update_ref("forced-update", ref, transaction, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
 			       r ? _("unable to update local ref") : _("forced update"),
 			       remote, pretty_ref, summary_width);
@@ -1094,8 +1105,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, what, rm, &note,
-						       summary_width);
+				rc |= update_local_ref(ref, NULL, what,
+						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
 				/*
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 5/5] fetch: implement support for atomic reference updates
  2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2021-01-12 12:27   ` [PATCH v4 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
@ 2021-01-12 12:27   ` Patrick Steinhardt
  4 siblings, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 13728 bytes --]

When executing a fetch, then git will currently allocate one reference
transaction per reference update and directly commit it. This means that
fetches are non-atomic: even if some of the reference updates fail,
others may still succeed and modify local references.

This is fine in many scenarios, but this strategy has its downsides.

- The view of remote references may be inconsistent and may show a
  bastardized state of the remote repository.

- Batching together updates may improve performance in certain
  scenarios. While the impact probably isn't as pronounced with loose
  references, the upcoming reftable backend may benefit as it needs to
  write less files in case the update is batched.

- The reference-update hook is currently being executed twice per
  updated reference. While this doesn't matter when there is no such
  hook, we have seen severe performance regressions when doing a
  git-fetch(1) with reference-transaction hook when the remote
  repository has hundreds of thousands of references.

Similar to `git push --atomic`, this commit thus introduces atomic
fetches. Instead of allocating one reference transaction per updated
reference, it causes us to only allocate a single transaction and commit
it as soon as all updates were received. If locking of any reference
fails, then we abort the complete transaction and don't update any
reference, which gives us an all-or-nothing fetch.

Note that this may not completely fix the first of above downsides, as
the consistent view also depends on the server-side. If the server
doesn't have a consistent view of its own references during the
reference negotiation phase, then the client would get the same
inconsistent view the server has. This is a separate problem though and,
if it actually exists, can be fixed at a later point.

This commit also changes the way we write FETCH_HEAD in case `--atomic`
is passed. Instead of writing changes as we go, we need to accumulate
all changes first and only commit them at the end when we know that all
reference updates succeeded. Ideally, we'd just do so via a temporary
file so that we don't need to carry all updates in-memory. This isn't
trivially doable though considering the `--append` mode, where we do not
truncate the file but simply append to it. And given that we support
concurrent processes appending to FETCH_HEAD at the same time without
any loss of data, seeding the temporary file with current contents of
FETCH_HEAD initially and then doing a rename wouldn't work either. So
this commit implements the simple strategy of buffering all changes and
appending them to the file on commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/fetch-options.txt |   4 +
 builtin/fetch.c                 |  46 ++++++++-
 t/t5510-fetch.sh                | 168 ++++++++++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 5 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 2bf77b46fd..07783deee3 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -7,6 +7,10 @@
 	existing contents of `.git/FETCH_HEAD`.  Without this
 	option old data in `.git/FETCH_HEAD` will be overwritten.
 
+--atomic::
+	Use an atomic transaction to update local refs. Either all refs are
+	updated, or on error, no refs are updated.
+
 --depth=<depth>::
 	Limit fetching to the specified number of commits from the tip of
 	each remote branch history. If fetching to a 'shallow' repository
diff --git a/builtin/fetch.c b/builtin/fetch.c
index cada732325..91f3d20696 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -63,6 +63,7 @@ static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
 static int max_jobs = -1, submodule_fetch_jobs_config = -1;
 static int fetch_parallel_config = 1;
+static int atomic_fetch;
 static enum transport_family family;
 static const char *depth;
 static const char *deepen_since;
@@ -144,6 +145,8 @@ static struct option builtin_fetch_options[] = {
 		 N_("set upstream for git pull/fetch")),
 	OPT_BOOL('a', "append", &append,
 		 N_("append to .git/FETCH_HEAD instead of overwriting")),
+	OPT_BOOL(0, "atomic", &atomic_fetch,
+		 N_("use atomic transaction to update references")),
 	OPT_STRING(0, "upload-pack", &upload_pack, N_("path"),
 		   N_("path to upload pack on remote end")),
 	OPT__FORCE(&force, N_("force overwrite of local reference"), 0),
@@ -970,13 +973,23 @@ static void append_fetch_head(struct fetch_head *fetch_head,
 			strbuf_addch(&fetch_head->buf, url[i]);
 	strbuf_addch(&fetch_head->buf, '\n');
 
-	strbuf_write(&fetch_head->buf, fetch_head->fp);
-	strbuf_reset(&fetch_head->buf);
+	/*
+	 * When using an atomic fetch, we do not want to update FETCH_HEAD if
+	 * any of the reference updates fails. We thus have to write all
+	 * updates to a buffer first and only commit it as soon as all
+	 * references have been successfully updated.
+	 */
+	if (!atomic_fetch) {
+		strbuf_write(&fetch_head->buf, fetch_head->fp);
+		strbuf_reset(&fetch_head->buf);
+	}
 }
 
 static void commit_fetch_head(struct fetch_head *fetch_head)
 {
-	/* Nothing to commit yet. */
+	if (!fetch_head->fp || !atomic_fetch)
+		return;
+	strbuf_write(&fetch_head->buf, fetch_head->fp);
 }
 
 static void close_fetch_head(struct fetch_head *fetch_head)
@@ -1003,7 +1016,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	struct fetch_head fetch_head;
 	struct commit *commit;
 	int url_len, i, rc = 0;
-	struct strbuf note = STRBUF_INIT;
+	struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+	struct ref_transaction *transaction = NULL;
 	const char *what, *kind;
 	struct ref *rm;
 	char *url;
@@ -1029,6 +1043,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (atomic_fetch) {
+		transaction = ref_transaction_begin(&err);
+		if (!transaction) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	prepare_format_display(ref_map);
 
 	/*
@@ -1105,7 +1127,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 			strbuf_reset(&note);
 			if (ref) {
-				rc |= update_local_ref(ref, NULL, what,
+				rc |= update_local_ref(ref, transaction, what,
 						       rm, &note, summary_width);
 				free(ref);
 			} else if (write_fetch_head || dry_run) {
@@ -1131,6 +1153,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
+	if (!rc && transaction) {
+		rc = ref_transaction_commit(transaction, &err);
+		if (rc) {
+			error("%s", err.buf);
+			goto abort;
+		}
+	}
+
 	if (!rc)
 		commit_fetch_head(&fetch_head);
 
@@ -1150,6 +1180,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 
  abort:
 	strbuf_release(&note);
+	strbuf_release(&err);
+	ref_transaction_free(transaction);
 	free(url);
 	close_fetch_head(&fetch_head);
 	return rc;
@@ -1961,6 +1993,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			die(_("--filter can only be used with the remote "
 			      "configured in extensions.partialclone"));
 
+		if (atomic_fetch)
+			die(_("--atomic can only be used when fetching "
+			      "from one remote"));
+
 		if (stdin_refspecs)
 			die(_("--stdin can only be used when fetching "
 			      "from one remote"));
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2013051a64..109d15be98 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -176,6 +176,174 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
 	git rev-parse sometag
 '
 
+test_expect_success 'fetch --atomic works with a single branch' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch &&
+	oid=$(git rev-parse atomic-branch) &&
+	echo "$oid" >expected &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse origin/atomic-branch >actual &&
+	test_cmp expected actual &&
+	test $oid = "$(git -C atomic rev-parse --verify FETCH_HEAD)"
+'
+
+test_expect_success 'fetch --atomic works with multiple branches' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-branch-1 &&
+	git branch atomic-branch-2 &&
+	git branch atomic-branch-3 &&
+	git rev-parse refs/heads/atomic-branch-1 refs/heads/atomic-branch-2 refs/heads/atomic-branch-3 >actual &&
+
+	git -C atomic fetch --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-branch-1 refs/remotes/origin/atomic-branch-2 refs/remotes/origin/atomic-branch-3 >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic works with mixed branches and tags' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-mixed-branch &&
+	git tag atomic-mixed-tag &&
+	git rev-parse refs/heads/atomic-mixed-branch refs/tags/atomic-mixed-tag >actual &&
+
+	git -C atomic fetch --tags --atomic origin &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-mixed-branch refs/tags/atomic-mixed-tag >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic prunes references' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-prune-delete &&
+	git clone . atomic &&
+	git branch --delete atomic-prune-delete &&
+	git branch atomic-prune-create &&
+	git rev-parse refs/heads/atomic-prune-create >actual &&
+
+	git -C atomic fetch --prune --atomic origin &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-prune-delete &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-prune-create >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'fetch --atomic aborts with non-fast-forward update' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git branch atomic-non-ff &&
+	git clone . atomic &&
+	git rev-parse HEAD >actual &&
+
+	git branch atomic-new-branch &&
+	parent_commit=$(git rev-parse atomic-non-ff~) &&
+	git update-ref refs/heads/atomic-non-ff $parent_commit &&
+
+	test_must_fail git -C atomic fetch --atomic origin refs/heads/*:refs/remotes/origin/* &&
+	test_must_fail git -C atomic rev-parse refs/remotes/origin/atomic-new-branch &&
+	git -C atomic rev-parse refs/remotes/origin/atomic-non-ff >expected &&
+	test_cmp expected actual &&
+	test_must_be_empty atomic/.git/FETCH_HEAD
+'
+
+test_expect_success 'fetch --atomic executes a single reference transaction only' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-1 &&
+	git branch atomic-hooks-2 &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+		committed
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-2
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+	EOF
+
+	git -C atomic fetch --atomic origin &&
+	test_cmp expected atomic/actual
+'
+
+test_expect_success 'fetch --atomic aborts all reference updates if hook aborts' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	git branch atomic-hooks-abort-1 &&
+	git branch atomic-hooks-abort-2 &&
+	git branch atomic-hooks-abort-3 &&
+	git tag atomic-hooks-abort &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	cat >expected <<-EOF &&
+		prepared
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+		aborted
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-1
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-2
+		$ZERO_OID $head_oid refs/remotes/origin/atomic-hooks-abort-3
+		$ZERO_OID $head_oid refs/tags/atomic-hooks-abort
+	EOF
+
+	rm -f atomic/actual &&
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		( echo "$*" && cat ) >>actual
+		exit 1
+	EOF
+
+	git -C atomic for-each-ref >expected-refs &&
+	test_must_fail git -C atomic fetch --tags --atomic origin &&
+	git -C atomic for-each-ref >actual-refs &&
+	test_cmp expected-refs actual-refs &&
+	test_must_be_empty atomic/.git/FETCH_HEAD
+'
+
+test_expect_success 'fetch --atomic --append appends to FETCH_HEAD' '
+	test_when_finished "rm -rf \"$D\"/atomic" &&
+
+	cd "$D" &&
+	git clone . atomic &&
+	oid=$(git rev-parse HEAD) &&
+
+	git branch atomic-fetch-head-1 &&
+	git -C atomic fetch --atomic origin atomic-fetch-head-1 &&
+	test_line_count = 1 atomic/.git/FETCH_HEAD &&
+
+	git branch atomic-fetch-head-2 &&
+	git -C atomic fetch --atomic --append origin atomic-fetch-head-2 &&
+	test_line_count = 2 atomic/.git/FETCH_HEAD &&
+	cp atomic/.git/FETCH_HEAD expected &&
+
+	write_script atomic/.git/hooks/reference-transaction <<-\EOF &&
+		exit 1
+	EOF
+
+	git branch atomic-fetch-head-3 &&
+	test_must_fail git -C atomic fetch --atomic --append origin atomic-fetch-head-3 &&
+	test_cmp expected atomic/.git/FETCH_HEAD
+'
+
 test_expect_success '--refmap="" ignores configured refspec' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone "$D" remote-refs &&
-- 
2.30.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-12 12:22           ` Patrick Steinhardt
@ 2021-01-12 12:29             ` Patrick Steinhardt
  2021-01-12 19:19             ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Patrick Steinhardt @ 2021-01-12 12:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jeff King

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Tue, Jan 12, 2021 at 01:22:40PM +0100, Patrick Steinhardt wrote:
> On Mon, Jan 11, 2021 at 11:47:08AM -0800, Junio C Hamano wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > > If we change to
> > > always accumulate first and flush once we're done, then we essentially
> > > have a change in behaviour as FETCH_HEADs aren't written in an
> > > interleaving fashion anymore, but only at the end.
> > 
> > I view it a good thing, actually, for another reason [*2*], but that
> > would take a follow-up fix that is much more involved, so let's not
> > go there (yet).  At least buffering a single line's worth of output
> > in a buffer and writing it out in a single write() would get us much
> > closer to solving the "mixed up output" from multiple processes
> > problem the current code seems to already have.
> 
> The buffering of a single line is exactly what we're doing now in the
> non-atomic case. Previously there had been multiple writes, now we only
> call `strbuf_write()` once on the buffer for each reference.
> 
> > > Also, if there is any
> > > unexpected error in between updating references which causes us to die,
> > > then we wouldn't have written the already updated references to
> > > FETCH_HEAD now.
> > 
> > That one may be a valid concern.
> > 
> > Thanks.
> 
> Just to be explicit: are you fine with changes in this patch as-is? They
> don't solve concurrent writes to FETCH_HEAD, but they should make it
> easier to solve that if we ever need to.
> 
> Patrick

Never mind, I forgot that I'm still replying on v2 of this patch series.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
  2021-01-12 12:22           ` Patrick Steinhardt
  2021-01-12 12:29             ` Patrick Steinhardt
@ 2021-01-12 19:19             ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2021-01-12 19:19 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

>> 
>> > > +		fprintf(fetch_head->fp, "%s\t%s\t%s",
>> > > +			old_oid, merge_status_marker, note);
>> > > +		for (i = 0; i < url_len; ++i)
>> > > +			if ('\n' == url[i])
>> > > +				fputs("\\n", fetch_head->fp);
>> > > +			else
>> > > +				fputc(url[i], fetch_head->fp);
>> > > +		fputc('\n', fetch_head->fp);
>> 
>> ... these stdio operations for a single record would result in a
>> single atomic write() that would not compete with writes by other
>> processes.  So I wouldn't call "the current design allows ... at the
>> same time."  It has been broken for years and nobody complained.
>
> Fair enough. I somehow blindly assumed that `git fetch --all`, which
> does invoke `git fetch --append`, did perform the fetch in parallel. If
> that's not the case: all the better.

The "--all" option started as "one after another, one at a time",
but these days goes thru fetch_multiple() where we started to use
run_processes_parallel() API without giving it much thought what it
would do to the writing of FETCH_HEAD; since around d54dea77 (fetch:
let --jobs=<n> parallelize --multiple, too, 2019-10-05), this
codepath wrt FETCH_HEAD is utterly broken, I would have to say.

> The buffering of a single line is exactly what we're doing now in the
> non-atomic case. Previously there had been multiple writes, now we only
> call `strbuf_write()` once on the buffer for each reference.

Exactly.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-01-12 19:22 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-07 13:51 ` [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-07 22:59   ` Junio C Hamano
2021-01-08  0:45     ` Christian Couder
2021-01-08  7:18       ` Junio C Hamano
2021-01-07 13:51 ` [PATCH 2/2] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-08  0:19   ` Junio C Hamano
2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
2021-01-08 12:11   ` [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
2021-01-08 23:40     ` Junio C Hamano
2021-01-11 10:26       ` Patrick Steinhardt
2021-01-11 19:24         ` Junio C Hamano
2021-01-08 12:11   ` [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
2021-01-08 23:50     ` Junio C Hamano
2021-01-11 10:28       ` Patrick Steinhardt
2021-01-08 12:11   ` [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-08 23:53     ` Junio C Hamano
2021-01-08 12:11   ` [PATCH v2 4/4] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-09  0:05     ` Junio C Hamano
2021-01-11 10:42       ` Patrick Steinhardt
2021-01-11 19:47         ` Junio C Hamano
2021-01-12 12:22           ` Patrick Steinhardt
2021-01-12 12:29             ` Patrick Steinhardt
2021-01-12 19:19             ` Junio C Hamano
2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
2021-01-11 11:05   ` [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
2021-01-11 23:16     ` Junio C Hamano
2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
2021-01-11 11:10     ` Patrick Steinhardt
2021-01-11 11:17     ` Christian Couder
2021-01-11 23:21     ` Junio C Hamano
2021-01-11 11:05   ` [PATCH v3 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
2021-01-11 11:05   ` [PATCH v3 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-11 11:05   ` [PATCH v3 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-12  0:04   ` [PATCH v3 0/5] " Junio C Hamano
2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git