git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
@ 2019-05-24 23:09 Eric Rannaud
  2019-05-25  2:18 ` Eric Rannaud
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Rannaud @ 2019-05-24 23:09 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeremy Serror, Elijah Newren, Eric Rannaud

At a checkpoint, fast-import resets all branches and tags on disk to the
OID that we have in memory. If --force is not given, only branch resets
that result in fast forwards with respect to the state on disk are
allowed. fast-import never updates its internal view of branches in
response to external changes.

Also, even for refs that fast-import has not been commanded to change
for a long time (or ever), at each checkpoint, we will systematically
reset them to the last state this process knows about, a state which may
have been set before the previous checkpoint. Any changes made to these
refs by a different process since the last checkpoint will be
overwritten.

1> is one process, 2> is another:

	1> $ git fast-import --force
	1> reset refs/heads/master
	1> from $A
	1> checkpoint
	2> $ git branch -f refs/heads/master $B
	1> reset refs/heads/tip  # not changing master!
	1> from $C
	1> checkpoint

At this point master points again to $A. (With this patch applied,
master points to $B.)

This problem is mitigated somewhat for branches when --force is not
specified (the default), as requiring a fast forward means in practice
that concurrent external changes are likely to be preserved. But it's
not foolproof either:

	1> $ git fast-import
	1> reset refs/heads/master
	1> from $A
	1> checkpoint
	2> $ git branch -f refs/heads/master refs/heads/master~1
	1> reset refs/heads/tip  # not changing master!
	1> from $C
	1> checkpoint

At this point, master points again to $A, not $A~1 as requested by the
second process. (With this patch applied, master points to $A~1.)

With this patch, we keep track of whether a particular branch or tag has
been changed by this fast-import process since our last checkpoint (or
startup). At the next checkpoint, only refs and tags that new commands
have changed are written to disk. To be clear, fast-import still does
not update its internal view of branches in response to external
changes, but now it avoids interfering with external changes unless
there was an explicit command to do so since the last checkpoint.

Note, therefore, that fast-import will still overwrite external changes
to some refs and tags. For example, with this patch applied, the change
made by process 2 will be overwritten:

	1> $ git fast-import --force
	1> reset refs/heads/master
	1> from $A
	1> checkpoint
	2> $ git branch -f refs/heads/master $B
	1> reset refs/heads/master
	1> from $C
	1> checkpoint

and master will point to $C.

Signed-off-by: Eric Rannaud <e@nanocritical.com>
---
 fast-import.c          |  14 ++++++
 t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 104 insertions(+), 16 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index f38d04fa5851..e9c3ea23ae42 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -103,6 +103,7 @@ struct branch {
 	uintmax_t num_notes;
 	unsigned active : 1;
 	unsigned delete : 1;
+	unsigned changed : 1;
 	unsigned pack_id : PACK_ID_BITS;
 	struct object_id oid;
 };
@@ -110,6 +111,7 @@ struct branch {
 struct tag {
 	struct tag *next_tag;
 	const char *name;
+	unsigned changed : 1;
 	unsigned int pack_id;
 	struct object_id oid;
 };
@@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name)
 	b->branch_tree.versions[1].mode = S_IFDIR;
 	b->num_notes = 0;
 	b->active = 0;
+	b->changed = 0;
 	b->pack_id = MAX_PACK_ID;
 	branch_table[hc] = b;
 	branch_count++;
@@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b)
 	struct object_id old_oid;
 	struct strbuf err = STRBUF_INIT;
 
+	if (!b->changed)
+		return 0;
+	b->changed = 0;
+
 	if (is_null_oid(&b->oid)) {
 		if (b->delete)
 			delete_ref(NULL, b->name, NULL, 0);
@@ -1636,6 +1643,10 @@ static void dump_tags(void)
 		goto cleanup;
 	}
 	for (t = first_tag; t; t = t->next_tag) {
+		if (!t->changed)
+			continue;
+		t->changed = 0;
+
 		strbuf_reset(&ref_name);
 		strbuf_addf(&ref_name, "refs/tags/%s", t->name);
 
@@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg)
 
 	if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark))
 		b->pack_id = pack_id;
+	b->changed = 1;
 	b->last_commit = object_count_by_type[OBJ_COMMIT];
 }
 
@@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg)
 		t->pack_id = MAX_PACK_ID;
 	else
 		t->pack_id = pack_id;
+	t->changed = 1;
 }
 
 static void parse_reset_branch(const char *arg)
@@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg)
 		b = new_branch(arg);
 	read_next_command();
 	parse_from(b);
+	b->changed = 1;
 	if (command_buf.len > 0)
 		unread_command_buf = 1;
 }
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 3668263c4046..12abaebb22b6 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' '
 ### series V (checkpoint)
 ###
 
-# The commands in input_file should not produce any output on the file
-# descriptor set with --cat-blob-fd (or stdout if unspecified).
+# Instructions can be sent from $input_file to background_import() loop via the
+# fast-import progress command.
+# 
+# 	progress do: shell
+# Parse the next progress line and invoke it as a shell command.
+# 
+# 	progress do: checkpoint and stop
+# Send checkpoint and wait for its completion.
+# 
+# 	progress do: stop
+# Internal instruction.
 #
 # To make sure you're observing the side effects of checkpoint *before*
 # fast-import terminates (and thus writes out its state), check that the
 # fast-import process is still running using background_import_still_running
 # *after* evaluating the test conditions.
-background_import_then_checkpoint () {
+background_import () {
 	options=$1
 	input_file=$2
 
@@ -3153,22 +3162,30 @@ background_import_then_checkpoint () {
 	# pipes writing sequence. We want to assume that the write below could
 	# block, e.g. if fast-import blocks writing its own output to &9
 	# because there is no reader on &9 yet.
-	(
-		cat "$input_file"
-		echo "checkpoint"
-		echo "progress checkpoint"
-	) >&8 &
+	cat "$input_file" >&8 &
 
 	error=1 ;# assume the worst
 	while read output <&9
 	do
-		if test "$output" = "progress checkpoint"
+		if test "$output" = "progress do: shell"
+		then
+			read output <&9
+			shell_cmd="$(echo $output |sed -e 's/^progress //')"
+			$shell_cmd
+		elif test "$output" = "progress do: checkpoint and stop"
+		then
+			(
+				echo "checkpoint"
+				echo "progress do: stop"
+			) >&8 &
+		elif test "$output" = "progress do: stop"
 		then
 			error=0
 			break
+		else
+			# otherwise ignore cruft
+			echo >&2 "cruft: $output"
 		fi
-		# otherwise ignore cruft
-		echo >&2 "cruft: $output"
 	done
 
 	if test $error -eq 1
@@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out
 	cat >input <<-INPUT_END &&
 	progress foo
 	progress bar
+	progress do: checkpoint and stop
 
 	INPUT_END
 
-	background_import_then_checkpoint "" input &&
+	background_import "" input &&
 	background_import_still_running
 '
 
@@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' '
 	reset refs/heads/V
 	from refs/heads/U
 
+	progress do: checkpoint and stop
+
 	INPUT_END
 
-	background_import_then_checkpoint "" input &&
+	background_import "" input &&
 	test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" &&
 	background_import_still_running
 '
@@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' '
 	data 0
 	from refs/heads/U
 
+	progress do: checkpoint and stop
+
 	INPUT_END
 
-	background_import_then_checkpoint "--export-marks=marks.actual" input &&
+	background_import "--export-marks=marks.actual" input &&
 
 	echo ":1 $(git rev-parse --verify V)" >marks.expected &&
 
@@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no
 	data 0
 	from refs/heads/U
 
+	progress do: checkpoint and stop
+
 	INPUT_END
 
-	background_import_then_checkpoint "--export-marks=marks.actual" input &&
+	background_import "--export-marks=marks.actual" input &&
 
 	echo ":2 $(git rev-parse --verify V2)" >marks.expected &&
 
@@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
 	tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 	data 0
 
+	progress do: checkpoint and stop
+
 	INPUT_END
 
-	background_import_then_checkpoint "" input &&
+	background_import "" input &&
 	git show-ref -d Vtag &&
 	background_import_still_running
 '
 
+test_expect_success PIPE 'V: checkpoint only resets changed branches' '
+	cat >input <<-INPUT_END &&
+	reset refs/heads/V3
+	from refs/heads/U
+
+	checkpoint
+
+	progress do: shell
+	progress git branch -f V3 V
+
+	reset refs/heads/V4
+	from refs/heads/U
+
+	progress do: checkpoint and stop
+
+	INPUT_END
+
+	background_import "--force" input &&
+	test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" &&
+	background_import_still_running
+'
+
+test_expect_success PIPE 'V: checkpoint only updates changed tags' '
+	cat >input <<-INPUT_END &&
+	tag Vtag2
+	from refs/heads/U
+	tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data 0
+
+	checkpoint
+
+	progress do: shell
+	progress git tag -f Vtag2 V
+
+	tag Vtag3
+	from refs/heads/U
+	tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data 0
+
+	progress do: checkpoint and stop
+
+	INPUT_END
+
+	background_import "" input &&
+	test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" &&
+	background_import_still_running
+'
+
 ###
 ### series W (get-mark and empty orphan commits)
 ###
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
  2019-05-24 23:09 [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them Eric Rannaud
@ 2019-05-25  2:18 ` Eric Rannaud
  2019-05-27 21:12   ` Elijah Newren
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Rannaud @ 2019-05-25  2:18 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Jeremy Serror, Elijah Newren

(The patch is identical to v3 but I've tried to clarify the commit
message and added the missing sign-off.)

On Fri, May 24, 2019 at 4:10 PM Eric Rannaud <e@nanocritical.com> wrote:
>
> At a checkpoint, fast-import resets all branches and tags on disk to the
> OID that we have in memory. If --force is not given, only branch resets
> that result in fast forwards with respect to the state on disk are
> allowed. fast-import never updates its internal view of branches in
> response to external changes.
>
> Also, even for refs that fast-import has not been commanded to change
> for a long time (or ever), at each checkpoint, we will systematically
> reset them to the last state this process knows about, a state which may
> have been set before the previous checkpoint. Any changes made to these
> refs by a different process since the last checkpoint will be
> overwritten.
>
> 1> is one process, 2> is another:
>
>         1> $ git fast-import --force
>         1> reset refs/heads/master
>         1> from $A
>         1> checkpoint
>         2> $ git branch -f refs/heads/master $B
>         1> reset refs/heads/tip  # not changing master!
>         1> from $C
>         1> checkpoint
>
> At this point master points again to $A. (With this patch applied,
> master points to $B.)
>
> This problem is mitigated somewhat for branches when --force is not
> specified (the default), as requiring a fast forward means in practice
> that concurrent external changes are likely to be preserved. But it's
> not foolproof either:
>
>         1> $ git fast-import
>         1> reset refs/heads/master
>         1> from $A
>         1> checkpoint
>         2> $ git branch -f refs/heads/master refs/heads/master~1
>         1> reset refs/heads/tip  # not changing master!
>         1> from $C
>         1> checkpoint
>
> At this point, master points again to $A, not $A~1 as requested by the
> second process. (With this patch applied, master points to $A~1.)
>
> With this patch, we keep track of whether a particular branch or tag has
> been changed by this fast-import process since our last checkpoint (or
> startup). At the next checkpoint, only refs and tags that new commands
> have changed are written to disk. To be clear, fast-import still does
> not update its internal view of branches in response to external
> changes, but now it avoids interfering with external changes unless
> there was an explicit command to do so since the last checkpoint.
>
> Note, therefore, that fast-import will still overwrite external changes
> to some refs and tags. For example, with this patch applied, the change
> made by process 2 will be overwritten:
>
>         1> $ git fast-import --force
>         1> reset refs/heads/master
>         1> from $A
>         1> checkpoint
>         2> $ git branch -f refs/heads/master $B
>         1> reset refs/heads/master
>         1> from $C
>         1> checkpoint
>
> and master will point to $C.
>
> Signed-off-by: Eric Rannaud <e@nanocritical.com>
> ---
>  fast-import.c          |  14 ++++++
>  t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++-------
>  2 files changed, 104 insertions(+), 16 deletions(-)
>
> diff --git a/fast-import.c b/fast-import.c
> index f38d04fa5851..e9c3ea23ae42 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -103,6 +103,7 @@ struct branch {
>         uintmax_t num_notes;
>         unsigned active : 1;
>         unsigned delete : 1;
> +       unsigned changed : 1;
>         unsigned pack_id : PACK_ID_BITS;
>         struct object_id oid;
>  };
> @@ -110,6 +111,7 @@ struct branch {
>  struct tag {
>         struct tag *next_tag;
>         const char *name;
> +       unsigned changed : 1;
>         unsigned int pack_id;
>         struct object_id oid;
>  };
> @@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name)
>         b->branch_tree.versions[1].mode = S_IFDIR;
>         b->num_notes = 0;
>         b->active = 0;
> +       b->changed = 0;
>         b->pack_id = MAX_PACK_ID;
>         branch_table[hc] = b;
>         branch_count++;
> @@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b)
>         struct object_id old_oid;
>         struct strbuf err = STRBUF_INIT;
>
> +       if (!b->changed)
> +               return 0;
> +       b->changed = 0;
> +
>         if (is_null_oid(&b->oid)) {
>                 if (b->delete)
>                         delete_ref(NULL, b->name, NULL, 0);
> @@ -1636,6 +1643,10 @@ static void dump_tags(void)
>                 goto cleanup;
>         }
>         for (t = first_tag; t; t = t->next_tag) {
> +               if (!t->changed)
> +                       continue;
> +               t->changed = 0;
> +
>                 strbuf_reset(&ref_name);
>                 strbuf_addf(&ref_name, "refs/tags/%s", t->name);
>
> @@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg)
>
>         if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark))
>                 b->pack_id = pack_id;
> +       b->changed = 1;
>         b->last_commit = object_count_by_type[OBJ_COMMIT];
>  }
>
> @@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg)
>                 t->pack_id = MAX_PACK_ID;
>         else
>                 t->pack_id = pack_id;
> +       t->changed = 1;
>  }
>
>  static void parse_reset_branch(const char *arg)
> @@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg)
>                 b = new_branch(arg);
>         read_next_command();
>         parse_from(b);
> +       b->changed = 1;
>         if (command_buf.len > 0)
>                 unread_command_buf = 1;
>  }
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 3668263c4046..12abaebb22b6 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' '
>  ### series V (checkpoint)
>  ###
>
> -# The commands in input_file should not produce any output on the file
> -# descriptor set with --cat-blob-fd (or stdout if unspecified).
> +# Instructions can be sent from $input_file to background_import() loop via the
> +# fast-import progress command.
> +#
> +#      progress do: shell
> +# Parse the next progress line and invoke it as a shell command.
> +#
> +#      progress do: checkpoint and stop
> +# Send checkpoint and wait for its completion.
> +#
> +#      progress do: stop
> +# Internal instruction.
>  #
>  # To make sure you're observing the side effects of checkpoint *before*
>  # fast-import terminates (and thus writes out its state), check that the
>  # fast-import process is still running using background_import_still_running
>  # *after* evaluating the test conditions.
> -background_import_then_checkpoint () {
> +background_import () {
>         options=$1
>         input_file=$2
>
> @@ -3153,22 +3162,30 @@ background_import_then_checkpoint () {
>         # pipes writing sequence. We want to assume that the write below could
>         # block, e.g. if fast-import blocks writing its own output to &9
>         # because there is no reader on &9 yet.
> -       (
> -               cat "$input_file"
> -               echo "checkpoint"
> -               echo "progress checkpoint"
> -       ) >&8 &
> +       cat "$input_file" >&8 &
>
>         error=1 ;# assume the worst
>         while read output <&9
>         do
> -               if test "$output" = "progress checkpoint"
> +               if test "$output" = "progress do: shell"
> +               then
> +                       read output <&9
> +                       shell_cmd="$(echo $output |sed -e 's/^progress //')"
> +                       $shell_cmd
> +               elif test "$output" = "progress do: checkpoint and stop"
> +               then
> +                       (
> +                               echo "checkpoint"
> +                               echo "progress do: stop"
> +                       ) >&8 &
> +               elif test "$output" = "progress do: stop"
>                 then
>                         error=0
>                         break
> +               else
> +                       # otherwise ignore cruft
> +                       echo >&2 "cruft: $output"
>                 fi
> -               # otherwise ignore cruft
> -               echo >&2 "cruft: $output"
>         done
>
>         if test $error -eq 1
> @@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out
>         cat >input <<-INPUT_END &&
>         progress foo
>         progress bar
> +       progress do: checkpoint and stop
>
>         INPUT_END
>
> -       background_import_then_checkpoint "" input &&
> +       background_import "" input &&
>         background_import_still_running
>  '
>
> @@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' '
>         reset refs/heads/V
>         from refs/heads/U
>
> +       progress do: checkpoint and stop
> +
>         INPUT_END
>
> -       background_import_then_checkpoint "" input &&
> +       background_import "" input &&
>         test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" &&
>         background_import_still_running
>  '
> @@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' '
>         data 0
>         from refs/heads/U
>
> +       progress do: checkpoint and stop
> +
>         INPUT_END
>
> -       background_import_then_checkpoint "--export-marks=marks.actual" input &&
> +       background_import "--export-marks=marks.actual" input &&
>
>         echo ":1 $(git rev-parse --verify V)" >marks.expected &&
>
> @@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no
>         data 0
>         from refs/heads/U
>
> +       progress do: checkpoint and stop
> +
>         INPUT_END
>
> -       background_import_then_checkpoint "--export-marks=marks.actual" input &&
> +       background_import "--export-marks=marks.actual" input &&
>
>         echo ":2 $(git rev-parse --verify V2)" >marks.expected &&
>
> @@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
>         tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
>         data 0
>
> +       progress do: checkpoint and stop
> +
>         INPUT_END
>
> -       background_import_then_checkpoint "" input &&
> +       background_import "" input &&
>         git show-ref -d Vtag &&
>         background_import_still_running
>  '
>
> +test_expect_success PIPE 'V: checkpoint only resets changed branches' '
> +       cat >input <<-INPUT_END &&
> +       reset refs/heads/V3
> +       from refs/heads/U
> +
> +       checkpoint
> +
> +       progress do: shell
> +       progress git branch -f V3 V
> +
> +       reset refs/heads/V4
> +       from refs/heads/U
> +
> +       progress do: checkpoint and stop
> +
> +       INPUT_END
> +
> +       background_import "--force" input &&
> +       test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" &&
> +       background_import_still_running
> +'
> +
> +test_expect_success PIPE 'V: checkpoint only updates changed tags' '
> +       cat >input <<-INPUT_END &&
> +       tag Vtag2
> +       from refs/heads/U
> +       tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +       data 0
> +
> +       checkpoint
> +
> +       progress do: shell
> +       progress git tag -f Vtag2 V
> +
> +       tag Vtag3
> +       from refs/heads/U
> +       tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +       data 0
> +
> +       progress do: checkpoint and stop
> +
> +       INPUT_END
> +
> +       background_import "" input &&
> +       test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" &&
> +       background_import_still_running
> +'
> +
>  ###
>  ### series W (get-mark and empty orphan commits)
>  ###
> --
> 2.21.0
>


-- 
Eric Rannaud <e@nanocritical.com>
Nanocritical, CEO

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
  2019-05-25  2:18 ` Eric Rannaud
@ 2019-05-27 21:12   ` Elijah Newren
  2019-05-27 22:45     ` Eric Rannaud
  0 siblings, 1 reply; 6+ messages in thread
From: Elijah Newren @ 2019-05-27 21:12 UTC (permalink / raw)
  To: Eric Rannaud
  Cc: Git Mailing List, Junio C Hamano, Jeremy Serror, Elijah Newren

On Fri, May 24, 2019 at 4:10 PM Eric Rannaud <e@nanocritical.com> wrote:
>
> With this patch, we keep track of whether a particular branch or tag has
> been changed by this fast-import process since our last checkpoint (or
> startup). At the next checkpoint, only refs and tags that new commands
> have changed are written to disk. To be clear, fast-import still does
> not update its internal view of branches in response to external
> changes, but now it avoids interfering with external changes unless
> there was an explicit command to do so since the last checkpoint.

This patch looks reasonable to me; you can view it one of two basic ways:
  1) an optimization (only update what needs to be updated)
  2) a way to have early access to a completed branch

In particular, if the fast-import stream is dealing with unrelated (or
very divergent) branches and it completes one branch before others,
then your change allows people to have early access to both read &
update the completed branches while the import continues working on
other branches.

If you changed your description to sell it this way, it'd be fine
as-is.  But you use --force and make multiple mentions of concurrent
updates to branches in external processes during the fast-import
process.  That kind of description makes it really clear we need to
tighten up what happens with the checkpoint command when it hits a
failure (as mentioned in the commentary on V3).  Below is a patch that
does this.

I think we should either:

  1) update your commit message to sell it without mentioning the
     concurrent rewrites, and then I'll update my patch to not
     conflict (we both add new tests at the same location to the same
     file causing a simple conflict) by building on yours, OR
     
  2) update your patch to come after mine and add a comment to your
     commit message about how checkpoint will abort if it hits an
     error, suggesting that people should only update branches
     fast-import will not be updating further or they should use
     --force and deal with their changes being overwritten by
     fast-import.  Then you can submit our two patches as a series.

Thoughts?

Elijah

-- 8< --
Subject: [PATCH] fast-import: remember to check failure flag when
 checkpointing

fast-import, when finished, will flush out the current packfile and
update branches, tags, and marks, returning whether it was successful.
The point of the 'checkpoint' command was to do all the same things, but
continue processing the stream of input commands if it was successful.
Unfortunately, the checkpoint code forgot to check the 'failure' flag to
see if there was an error in e.g. updating the branches, which meant it
would also continue if there was a failure.  Fix this oversight.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 fast-import.c          |  2 ++
 t/t9300-fast-import.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/fast-import.c b/fast-import.c
index f38d04fa58..d0e12b03a0 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -3033,6 +3033,8 @@ static void checkpoint(void)
 	dump_branches();
 	dump_tags();
 	dump_marks();
+	if (failure)
+		exit(1);
 }
 
 static void parse_checkpoint(void)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 3668263c40..788a543b82 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3262,6 +3262,48 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
 	background_import_still_running
 '
 
+background_fast_import_race () {
+	options=$1
+	input_file=$2
+	exit_status=$3
+	extra_cmd="$4"
+
+	mkfifo V.input
+	exec 8<>V.input
+	rm V.input
+
+	git fast-import $options <&8 &
+	echo $! >V.pid &&
+	$extra_cmd &&
+
+	cat $input_file >&8
+	wait $(cat V.pid)
+	test $? -eq $exit_status
+}
+
+test_expect_success PIPE 'V: checkpoint fails if refs updated beforehand' '
+	git checkout --orphan V3 &&
+	git commit --allow-empty -m initial &&
+	INITIAL=$(git rev-parse HEAD) &&
+
+	cat >input <<-INPUT_END &&
+	feature done
+	commit refs/heads/V3
+	mark :3
+	committer Me My <self@and.eye> 1234567890 +0123
+	data 10
+	generated
+	from $INITIAL
+
+	checkpoint
+	done
+	INPUT_END
+
+	background_fast_import_race "" input 1 "git commit --allow-empty -m conflicting" &&
+	background_fast_import_race "--force" input 0 "git commit --allow-empty -m conflicting"
+'
+
+
 ###
 ### series W (get-mark and empty orphan commits)
 ###
-- 
2.22.0.rc1.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
  2019-05-27 21:12   ` Elijah Newren
@ 2019-05-27 22:45     ` Eric Rannaud
  2019-05-27 23:27       ` Elijah Newren
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Rannaud @ 2019-05-27 22:45 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List, Junio C Hamano, Jeremy Serror

On Mon, May 27, 2019 at 2:12 PM Elijah Newren <newren@gmail.com> wrote:
> On Fri, May 24, 2019 at 4:10 PM Eric Rannaud <e@nanocritical.com> wrote:
> >
> > With this patch, we keep track of whether a particular branch or tag has
> > been changed by this fast-import process since our last checkpoint (or
> > startup). At the next checkpoint, only refs and tags that new commands
> > have changed are written to disk. To be clear, fast-import still does
> > not update its internal view of branches in response to external
> > changes, but now it avoids interfering with external changes unless
> > there was an explicit command to do so since the last checkpoint.
>
> This patch looks reasonable to me; you can view it one of two basic ways:
>   1) an optimization (only update what needs to be updated)
>   2) a way to have early access to a completed branch
>
> In particular, if the fast-import stream is dealing with unrelated (or
> very divergent) branches and it completes one branch before others,
> then your change allows people to have early access to both read &
> update the completed branches while the import continues working on
> other branches.

That's a good way to describe how people with a more standard use case
than mine can benefit from my patch.


> If you changed your description to sell it this way, it'd be fine
> as-is.  But you use --force and make multiple mentions of concurrent
> updates to branches in external processes during the fast-import
> process.  That kind of description makes it really clear we need to
> tighten up what happens with the checkpoint command when it hits a
> failure (as mentioned in the commentary on V3).  Below is a patch that
> does this.
>
> I think we should either:
>
>   1) update your commit message to sell it without mentioning the
>      concurrent rewrites, and then I'll update my patch to not
>      conflict (we both add new tests at the same location to the same
>      file causing a simple conflict) by building on yours, OR
>
>   2) update your patch to come after mine and add a comment to your
>      commit message about how checkpoint will abort if it hits an
>      error, suggesting that people should only update branches
>      fast-import will not be updating further or they should use
>      --force and deal with their changes being overwritten by
>      fast-import.  Then you can submit our two patches as a series.
>
> Thoughts?

(2) is probably the most logical order to do things. I have a few
comments on your patch, below.



> -- 8< --
> Subject: [PATCH] fast-import: remember to check failure flag when
>  checkpointing
>
> fast-import, when finished, will flush out the current packfile and
> update branches, tags, and marks, returning whether it was successful.
> The point of the 'checkpoint' command was to do all the same things, but
> continue processing the stream of input commands if it was successful.
> Unfortunately, the checkpoint code forgot to check the 'failure' flag to
> see if there was an error in e.g. updating the branches, which meant it
> would also continue if there was a failure.  Fix this oversight.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  fast-import.c          |  2 ++
>  t/t9300-fast-import.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
>
> diff --git a/fast-import.c b/fast-import.c
> index f38d04fa58..d0e12b03a0 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -3033,6 +3033,8 @@ static void checkpoint(void)
>         dump_branches();
>         dump_tags();
>         dump_marks();
> +       if (failure)
> +               exit(1);
>  }
>
>  static void parse_checkpoint(void)
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 3668263c40..788a543b82 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -3262,6 +3262,48 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
>         background_import_still_running
>  '
>
> +background_fast_import_race () {
> +       options=$1
> +       input_file=$2
> +       exit_status=$3
> +       extra_cmd="$4"
> +
> +       mkfifo V.input
> +       exec 8<>V.input
> +       rm V.input
> +
> +       git fast-import $options <&8 &
> +       echo $! >V.pid &&
> +       $extra_cmd &&
> +
> +       cat $input_file >&8
> +       wait $(cat V.pid)
> +       test $? -eq $exit_status
> +}

I think the test will not fail without the patch (and therefore won't
catch a regression): while checkpoint doesn't currently check the
failure flag, it will be checked when cmd_main() returns so
exit_status will be 1.

You either want to:
- (a) add more independent commands after the checkpoint and check
that they were not run,
- or (b) run with --done, do not include a done command, and check
that fast-import does exit (but it's racy),
- or (c) you can reuse background_import to have a more explicit
sequence of events (in which case improvements to background_import
from my patch would have to be committed first).

Something like: (I did not try the following code)

cat >input <<-INPUT_END &&
commit refs/heads/V4
mark :1
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
data 0
from refs/heads/U

commit refs/heads/V4_child
mark :1
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
data 0
from refs/heads/V4^0

checkpoint

reset refs/heads/V5
from refs/heads/U

progress do: shell
progress git branch V5 V4

checkpoint

reset refs/heads/V5
from refs/heads/V4_child

progress do: checkpoint and stop

INPUT_END

background_import "" input &&

# V5 remains equal to V4 as set by the external "git branch" command:
#
# V5 is not updated by the second checkpoint (as resetting V5 to U isn't a
# fast-forward with respect to V4), and V5 is not updated by the reset to
# V4_child after the second checkpoint (which would be a fast-forward with
# respect to V4) as fast-import immediately exits on a failed checkpoint.
test "$(git rev-parse --verify V5)" = "$(git rev-parse --verify V4)" &&

# fast-import is not running anymore (but do kill it if it is):
! kill -9 $(cat V.pid) &> /dev/null &&
# fast-import exited with an error:
test $(wait $(cat V.pid)) -eq 1



> +test_expect_success PIPE 'V: checkpoint fails if refs updated beforehand' '
> +       git checkout --orphan V3 &&
> +       git commit --allow-empty -m initial &&
> +       INITIAL=$(git rev-parse HEAD) &&
> +
> +       cat >input <<-INPUT_END &&
> +       feature done
> +       commit refs/heads/V3
> +       mark :3
> +       committer Me My <self@and.eye> 1234567890 +0123

You likely want to use:
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE


> +       data 10
> +       generated
> +       from $INITIAL
> +
> +       checkpoint
> +       done
> +       INPUT_END
> +
> +       background_fast_import_race "" input 1 "git commit --allow-empty -m conflicting" &&
> +       background_fast_import_race "--force" input 0 "git commit --allow-empty -m conflicting"
> +'
> +
> +
>  ###
>  ### series W (get-mark and empty orphan commits)
>  ###
> --
> 2.22.0.rc1.dirty
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
  2019-05-27 22:45     ` Eric Rannaud
@ 2019-05-27 23:27       ` Elijah Newren
  2019-05-28  0:11         ` Eric Rannaud
  0 siblings, 1 reply; 6+ messages in thread
From: Elijah Newren @ 2019-05-27 23:27 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: Git Mailing List, Junio C Hamano, Jeremy Serror

Hi Eric,

On Mon, May 27, 2019 at 3:46 PM Eric Rannaud <e@nanocritical.com> wrote:
> On Mon, May 27, 2019 at 2:12 PM Elijah Newren <newren@gmail.com> wrote:
> > On Fri, May 24, 2019 at 4:10 PM Eric Rannaud <e@nanocritical.com> wrote:
>
> I think the test will not fail without the patch (and therefore won't
> catch a regression): while checkpoint doesn't currently check the
> failure flag, it will be checked when cmd_main() returns so
> exit_status will be 1.

Ooh, good catch.

> You either want to:
> - (a) add more independent commands after the checkpoint and check
> that they were not run,
> - or (b) run with --done, do not include a done command, and check
> that fast-import does exit (but it's racy),
> - or (c) you can reuse background_import to have a more explicit
> sequence of events (in which case improvements to background_import
> from my patch would have to be committed first).

That sounds good...though it's taking my short patch and just about
amounts to completely rewriting it.  Would you like to take it over
including authorship, and just add either a "Original-patch-by:" or
"Based-on-patch-by:" for me in the commit message (these two tags
appear to be the two most common attribution mechanism used in
git.git's history when someone does this)?

> > +       cat >input <<-INPUT_END &&
> > +       feature done
> > +       commit refs/heads/V3
> > +       mark :3
> > +       committer Me My <self@and.eye> 1234567890 +0123
>
> You likely want to use:
> committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE

I see other tests in that testsuite using this, and using it here
certainly wouldn't hurt; I'm not opposed to it.  But I'm
curious...other than "other tests in the same testcase use it a lot" I
don't see why the choice of committer name/email/date matters at all.
Is there an actual reason for this that I just missed?

Cheers,
Elijah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
  2019-05-27 23:27       ` Elijah Newren
@ 2019-05-28  0:11         ` Eric Rannaud
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Rannaud @ 2019-05-28  0:11 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List, Junio C Hamano, Jeremy Serror

On Mon, May 27, 2019 at 4:27 PM Elijah Newren <newren@gmail.com> wrote:
> That sounds good...though it's taking my short patch and just about
> amounts to completely rewriting it.  Would you like to take it over
> including authorship, and just add either a "Original-patch-by:" or
> "Based-on-patch-by:" for me in the commit message (these two tags
> appear to be the two most common attribution mechanism used in
> git.git's history when someone does this)?

Sure. Co-authored-by is an option too. I'll turn all of this in 3
patches: improve background_import, your patch, the rest of mine.


> > > +       cat >input <<-INPUT_END &&
> > > +       feature done
> > > +       commit refs/heads/V3
> > > +       mark :3
> > > +       committer Me My <self@and.eye> 1234567890 +0123
> >
> > You likely want to use:
> > committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
>
> I see other tests in that testsuite using this, and using it here
> certainly wouldn't hurt; I'm not opposed to it.  But I'm
> curious...other than "other tests in the same testcase use it a lot" I
> don't see why the choice of committer name/email/date matters at all.
> Is there an actual reason for this that I just missed?

There is no direct benefit here except consistency.

These vars are set the same on every test run so that the SHA1 of
generated patches is the same as in a previous run (see also
test_tick() to increment the date by one minute). So in general this
makes it easier to debug tests as hashes can be compared between runs.
But your hand-picked values do give you that property.

However, as normal git commands use the vars automatically, by using
them here, fast-import commands and direct git commands produce the
same hashes if they're doing the same work. That's the main benefit of
using them here: while the test does not currently rely on that
property, maybe it will (or someone debugging down the road will
expect that).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-28  0:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-24 23:09 [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them Eric Rannaud
2019-05-25  2:18 ` Eric Rannaud
2019-05-27 21:12   ` Elijah Newren
2019-05-27 22:45     ` Eric Rannaud
2019-05-27 23:27       ` Elijah Newren
2019-05-28  0:11         ` Eric Rannaud

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).