From: "Eric Rannaud" <e@nanocritical.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Jeremy Serror <jeremy.serror@gmail.com>,
Eric Rannaud <e@nanocritical.com>
Subject: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them
Date: Fri, 10 May 2019 01:41:15 -0700 [thread overview]
Message-ID: <ad63393c24c7346c5b606b29579dfacad4d307f3.1557477247.git.e@nanocritical.com> (raw)
At a checkpoint, fast-import resets all branches and tags on disk to the
OID that we have in memory. If --force is not given, only branch resets
that result in fast forwards with respect to the state on disk are
allowed.
With this approach, even for refs that fast-import has not been
commanded to change for a long time (or ever), at each checkpoint, we
will systematically reset them to the last state this process knows
about, a state which may have been set before the previous checkpoint.
Any changes made to these refs by a different process since the last
checkpoint will be overwritten.
1> is one process, 2> is another:
1> $ git fast-import --force
1> reset refs/heads/master
1> from $A
1> checkpoint
2> $ git branch -f refs/heads/master $B
1> reset refs/heads/tip
1> from $C
1> checkpoint
At this point master points again to $A.
This problem is mitigated somewhat for branches when --force is not
specified (the default), as requiring a fast forward means in practice
that concurrent external changes are likely to be preserved. But it's
not foolproof either:
1> $ git fast-import
1> reset refs/heads/master
1> from $A
1> checkpoint
2> $ git branch -f refs/heads/master refs/heads/master~1
1> reset refs/heads/tip
1> from $C
1> checkpoint
At this point, master points again to $A, not $A~1 as requested by the
second process.
We now keep track of whether branches and tags have been changed by this
fast-import process since our last checkpoint (or startup). At the next
checkpoint, only refs and tags that new commands have changed are
written to disk.
---
fast-import.c | 14 ++++++
t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++-------
2 files changed, 104 insertions(+), 16 deletions(-)
v2 and v1 were sent in Oct 2018. Only difference since is that the new
test cases in t9300 don't use sleep to implement the asynchronous dance.
Thanks.
diff --git a/fast-import.c b/fast-import.c
index f38d04fa5851..e9c3ea23ae42 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -103,6 +103,7 @@ struct branch {
uintmax_t num_notes;
unsigned active : 1;
unsigned delete : 1;
+ unsigned changed : 1;
unsigned pack_id : PACK_ID_BITS;
struct object_id oid;
};
@@ -110,6 +111,7 @@ struct branch {
struct tag {
struct tag *next_tag;
const char *name;
+ unsigned changed : 1;
unsigned int pack_id;
struct object_id oid;
};
@@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name)
b->branch_tree.versions[1].mode = S_IFDIR;
b->num_notes = 0;
b->active = 0;
+ b->changed = 0;
b->pack_id = MAX_PACK_ID;
branch_table[hc] = b;
branch_count++;
@@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b)
struct object_id old_oid;
struct strbuf err = STRBUF_INIT;
+ if (!b->changed)
+ return 0;
+ b->changed = 0;
+
if (is_null_oid(&b->oid)) {
if (b->delete)
delete_ref(NULL, b->name, NULL, 0);
@@ -1636,6 +1643,10 @@ static void dump_tags(void)
goto cleanup;
}
for (t = first_tag; t; t = t->next_tag) {
+ if (!t->changed)
+ continue;
+ t->changed = 0;
+
strbuf_reset(&ref_name);
strbuf_addf(&ref_name, "refs/tags/%s", t->name);
@@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg)
if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark))
b->pack_id = pack_id;
+ b->changed = 1;
b->last_commit = object_count_by_type[OBJ_COMMIT];
}
@@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg)
t->pack_id = MAX_PACK_ID;
else
t->pack_id = pack_id;
+ t->changed = 1;
}
static void parse_reset_branch(const char *arg)
@@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg)
b = new_branch(arg);
read_next_command();
parse_from(b);
+ b->changed = 1;
if (command_buf.len > 0)
unread_command_buf = 1;
}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 3668263c4046..12abaebb22b6 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' '
### series V (checkpoint)
###
-# The commands in input_file should not produce any output on the file
-# descriptor set with --cat-blob-fd (or stdout if unspecified).
+# Instructions can be sent from $input_file to background_import() loop via the
+# fast-import progress command.
+#
+# progress do: shell
+# Parse the next progress line and invoke it as a shell command.
+#
+# progress do: checkpoint and stop
+# Send checkpoint and wait for its completion.
+#
+# progress do: stop
+# Internal instruction.
#
# To make sure you're observing the side effects of checkpoint *before*
# fast-import terminates (and thus writes out its state), check that the
# fast-import process is still running using background_import_still_running
# *after* evaluating the test conditions.
-background_import_then_checkpoint () {
+background_import () {
options=$1
input_file=$2
@@ -3153,22 +3162,30 @@ background_import_then_checkpoint () {
# pipes writing sequence. We want to assume that the write below could
# block, e.g. if fast-import blocks writing its own output to &9
# because there is no reader on &9 yet.
- (
- cat "$input_file"
- echo "checkpoint"
- echo "progress checkpoint"
- ) >&8 &
+ cat "$input_file" >&8 &
error=1 ;# assume the worst
while read output <&9
do
- if test "$output" = "progress checkpoint"
+ if test "$output" = "progress do: shell"
+ then
+ read output <&9
+ shell_cmd="$(echo $output |sed -e 's/^progress //')"
+ $shell_cmd
+ elif test "$output" = "progress do: checkpoint and stop"
+ then
+ (
+ echo "checkpoint"
+ echo "progress do: stop"
+ ) >&8 &
+ elif test "$output" = "progress do: stop"
then
error=0
break
+ else
+ # otherwise ignore cruft
+ echo >&2 "cruft: $output"
fi
- # otherwise ignore cruft
- echo >&2 "cruft: $output"
done
if test $error -eq 1
@@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out
cat >input <<-INPUT_END &&
progress foo
progress bar
+ progress do: checkpoint and stop
INPUT_END
- background_import_then_checkpoint "" input &&
+ background_import "" input &&
background_import_still_running
'
@@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' '
reset refs/heads/V
from refs/heads/U
+ progress do: checkpoint and stop
+
INPUT_END
- background_import_then_checkpoint "" input &&
+ background_import "" input &&
test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" &&
background_import_still_running
'
@@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' '
data 0
from refs/heads/U
+ progress do: checkpoint and stop
+
INPUT_END
- background_import_then_checkpoint "--export-marks=marks.actual" input &&
+ background_import "--export-marks=marks.actual" input &&
echo ":1 $(git rev-parse --verify V)" >marks.expected &&
@@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no
data 0
from refs/heads/U
+ progress do: checkpoint and stop
+
INPUT_END
- background_import_then_checkpoint "--export-marks=marks.actual" input &&
+ background_import "--export-marks=marks.actual" input &&
echo ":2 $(git rev-parse --verify V2)" >marks.expected &&
@@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
data 0
+ progress do: checkpoint and stop
+
INPUT_END
- background_import_then_checkpoint "" input &&
+ background_import "" input &&
git show-ref -d Vtag &&
background_import_still_running
'
+test_expect_success PIPE 'V: checkpoint only resets changed branches' '
+ cat >input <<-INPUT_END &&
+ reset refs/heads/V3
+ from refs/heads/U
+
+ checkpoint
+
+ progress do: shell
+ progress git branch -f V3 V
+
+ reset refs/heads/V4
+ from refs/heads/U
+
+ progress do: checkpoint and stop
+
+ INPUT_END
+
+ background_import "--force" input &&
+ test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" &&
+ background_import_still_running
+'
+
+test_expect_success PIPE 'V: checkpoint only updates changed tags' '
+ cat >input <<-INPUT_END &&
+ tag Vtag2
+ from refs/heads/U
+ tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data 0
+
+ checkpoint
+
+ progress do: shell
+ progress git tag -f Vtag2 V
+
+ tag Vtag3
+ from refs/heads/U
+ tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+ data 0
+
+ progress do: checkpoint and stop
+
+ INPUT_END
+
+ background_import "" input &&
+ test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" &&
+ background_import_still_running
+'
+
###
### series W (get-mark and empty orphan commits)
###
--
2.21.0
next reply other threads:[~2019-05-10 8:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-10 8:41 Eric Rannaud [this message]
2019-05-15 5:30 ` [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them Junio C Hamano
2019-05-15 6:57 ` Eric Rannaud
2019-05-16 0:35 ` Elijah Newren
2019-05-16 5:54 ` Eric Rannaud
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad63393c24c7346c5b606b29579dfacad4d307f3.1557477247.git.e@nanocritical.com \
--to=e@nanocritical.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeremy.serror@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).