* [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them @ 2019-05-10 8:41 Eric Rannaud 2019-05-15 5:30 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Eric Rannaud @ 2019-05-10 8:41 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeremy Serror, Eric Rannaud At a checkpoint, fast-import resets all branches and tags on disk to the OID that we have in memory. If --force is not given, only branch resets that result in fast forwards with respect to the state on disk are allowed. With this approach, even for refs that fast-import has not been commanded to change for a long time (or ever), at each checkpoint, we will systematically reset them to the last state this process knows about, a state which may have been set before the previous checkpoint. Any changes made to these refs by a different process since the last checkpoint will be overwritten. 1> is one process, 2> is another: 1> $ git fast-import --force 1> reset refs/heads/master 1> from $A 1> checkpoint 2> $ git branch -f refs/heads/master $B 1> reset refs/heads/tip 1> from $C 1> checkpoint At this point master points again to $A. This problem is mitigated somewhat for branches when --force is not specified (the default), as requiring a fast forward means in practice that concurrent external changes are likely to be preserved. But it's not foolproof either: 1> $ git fast-import 1> reset refs/heads/master 1> from $A 1> checkpoint 2> $ git branch -f refs/heads/master refs/heads/master~1 1> reset refs/heads/tip 1> from $C 1> checkpoint At this point, master points again to $A, not $A~1 as requested by the second process. We now keep track of whether branches and tags have been changed by this fast-import process since our last checkpoint (or startup). At the next checkpoint, only refs and tags that new commands have changed are written to disk. --- fast-import.c | 14 ++++++ t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++------- 2 files changed, 104 insertions(+), 16 deletions(-) v2 and v1 were sent in Oct 2018. Only difference since is that the new test cases in t9300 don't use sleep to implement the asynchronous dance. Thanks. diff --git a/fast-import.c b/fast-import.c index f38d04fa5851..e9c3ea23ae42 100644 --- a/fast-import.c +++ b/fast-import.c @@ -103,6 +103,7 @@ struct branch { uintmax_t num_notes; unsigned active : 1; unsigned delete : 1; + unsigned changed : 1; unsigned pack_id : PACK_ID_BITS; struct object_id oid; }; @@ -110,6 +111,7 @@ struct branch { struct tag { struct tag *next_tag; const char *name; + unsigned changed : 1; unsigned int pack_id; struct object_id oid; }; @@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name) b->branch_tree.versions[1].mode = S_IFDIR; b->num_notes = 0; b->active = 0; + b->changed = 0; b->pack_id = MAX_PACK_ID; branch_table[hc] = b; branch_count++; @@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b) struct object_id old_oid; struct strbuf err = STRBUF_INIT; + if (!b->changed) + return 0; + b->changed = 0; + if (is_null_oid(&b->oid)) { if (b->delete) delete_ref(NULL, b->name, NULL, 0); @@ -1636,6 +1643,10 @@ static void dump_tags(void) goto cleanup; } for (t = first_tag; t; t = t->next_tag) { + if (!t->changed) + continue; + t->changed = 0; + strbuf_reset(&ref_name); strbuf_addf(&ref_name, "refs/tags/%s", t->name); @@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg) if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) b->pack_id = pack_id; + b->changed = 1; b->last_commit = object_count_by_type[OBJ_COMMIT]; } @@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg) t->pack_id = MAX_PACK_ID; else t->pack_id = pack_id; + t->changed = 1; } static void parse_reset_branch(const char *arg) @@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg) b = new_branch(arg); read_next_command(); parse_from(b); + b->changed = 1; if (command_buf.len > 0) unread_command_buf = 1; } diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 3668263c4046..12abaebb22b6 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' ' ### series V (checkpoint) ### -# The commands in input_file should not produce any output on the file -# descriptor set with --cat-blob-fd (or stdout if unspecified). +# Instructions can be sent from $input_file to background_import() loop via the +# fast-import progress command. +# +# progress do: shell +# Parse the next progress line and invoke it as a shell command. +# +# progress do: checkpoint and stop +# Send checkpoint and wait for its completion. +# +# progress do: stop +# Internal instruction. # # To make sure you're observing the side effects of checkpoint *before* # fast-import terminates (and thus writes out its state), check that the # fast-import process is still running using background_import_still_running # *after* evaluating the test conditions. -background_import_then_checkpoint () { +background_import () { options=$1 input_file=$2 @@ -3153,22 +3162,30 @@ background_import_then_checkpoint () { # pipes writing sequence. We want to assume that the write below could # block, e.g. if fast-import blocks writing its own output to &9 # because there is no reader on &9 yet. - ( - cat "$input_file" - echo "checkpoint" - echo "progress checkpoint" - ) >&8 & + cat "$input_file" >&8 & error=1 ;# assume the worst while read output <&9 do - if test "$output" = "progress checkpoint" + if test "$output" = "progress do: shell" + then + read output <&9 + shell_cmd="$(echo $output |sed -e 's/^progress //')" + $shell_cmd + elif test "$output" = "progress do: checkpoint and stop" + then + ( + echo "checkpoint" + echo "progress do: stop" + ) >&8 & + elif test "$output" = "progress do: stop" then error=0 break + else + # otherwise ignore cruft + echo >&2 "cruft: $output" fi - # otherwise ignore cruft - echo >&2 "cruft: $output" done if test $error -eq 1 @@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out cat >input <<-INPUT_END && progress foo progress bar + progress do: checkpoint and stop INPUT_END - background_import_then_checkpoint "" input && + background_import "" input && background_import_still_running ' @@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' ' reset refs/heads/V from refs/heads/U + progress do: checkpoint and stop + INPUT_END - background_import_then_checkpoint "" input && + background_import "" input && test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" && background_import_still_running ' @@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' ' data 0 from refs/heads/U + progress do: checkpoint and stop + INPUT_END - background_import_then_checkpoint "--export-marks=marks.actual" input && + background_import "--export-marks=marks.actual" input && echo ":1 $(git rev-parse --verify V)" >marks.expected && @@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no data 0 from refs/heads/U + progress do: checkpoint and stop + INPUT_END - background_import_then_checkpoint "--export-marks=marks.actual" input && + background_import "--export-marks=marks.actual" input && echo ":2 $(git rev-parse --verify V2)" >marks.expected && @@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' ' tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE data 0 + progress do: checkpoint and stop + INPUT_END - background_import_then_checkpoint "" input && + background_import "" input && git show-ref -d Vtag && background_import_still_running ' +test_expect_success PIPE 'V: checkpoint only resets changed branches' ' + cat >input <<-INPUT_END && + reset refs/heads/V3 + from refs/heads/U + + checkpoint + + progress do: shell + progress git branch -f V3 V + + reset refs/heads/V4 + from refs/heads/U + + progress do: checkpoint and stop + + INPUT_END + + background_import "--force" input && + test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" && + background_import_still_running +' + +test_expect_success PIPE 'V: checkpoint only updates changed tags' ' + cat >input <<-INPUT_END && + tag Vtag2 + from refs/heads/U + tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + data 0 + + checkpoint + + progress do: shell + progress git tag -f Vtag2 V + + tag Vtag3 + from refs/heads/U + tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + data 0 + + progress do: checkpoint and stop + + INPUT_END + + background_import "" input && + test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" && + background_import_still_running +' + ### ### series W (get-mark and empty orphan commits) ### -- 2.21.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them 2019-05-10 8:41 [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them Eric Rannaud @ 2019-05-15 5:30 ` Junio C Hamano 2019-05-15 6:57 ` Eric Rannaud 2019-05-16 0:35 ` Elijah Newren 0 siblings, 2 replies; 5+ messages in thread From: Junio C Hamano @ 2019-05-15 5:30 UTC (permalink / raw) To: Eric Rannaud, Elijah Newren; +Cc: git, Jeremy Serror "Eric Rannaud" <e@nanocritical.com> writes: > We now keep track of whether branches and tags have been changed by this > fast-import process since our last checkpoint (or startup). At the next > checkpoint, only refs and tags that new commands have changed are > written to disk. And when we notice that we have been competing with some other process and the ref that we wanted to update has already been updated to a different value, what happens? That should be part of the description of the new behaviour (aka "fix") written here. > --- Missing sign-off. It sounds like a worthwhile thing to do. It seems that Elijah has been active in fast-import/export area in the last 12 months, so let's borrow a second set of eyes from him. Thanks. > fast-import.c | 14 ++++++ > t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++------- > 2 files changed, 104 insertions(+), 16 deletions(-) > > > v2 and v1 were sent in Oct 2018. Only difference since is that the new > test cases in t9300 don't use sleep to implement the asynchronous dance. > > Thanks. > > > diff --git a/fast-import.c b/fast-import.c > index f38d04fa5851..e9c3ea23ae42 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -103,6 +103,7 @@ struct branch { > uintmax_t num_notes; > unsigned active : 1; > unsigned delete : 1; > + unsigned changed : 1; > unsigned pack_id : PACK_ID_BITS; > struct object_id oid; > }; > @@ -110,6 +111,7 @@ struct branch { > struct tag { > struct tag *next_tag; > const char *name; > + unsigned changed : 1; > unsigned int pack_id; > struct object_id oid; > }; > @@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name) > b->branch_tree.versions[1].mode = S_IFDIR; > b->num_notes = 0; > b->active = 0; > + b->changed = 0; > b->pack_id = MAX_PACK_ID; > branch_table[hc] = b; > branch_count++; > @@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b) > struct object_id old_oid; > struct strbuf err = STRBUF_INIT; > > + if (!b->changed) > + return 0; > + b->changed = 0; > + > if (is_null_oid(&b->oid)) { > if (b->delete) > delete_ref(NULL, b->name, NULL, 0); > @@ -1636,6 +1643,10 @@ static void dump_tags(void) > goto cleanup; > } > for (t = first_tag; t; t = t->next_tag) { > + if (!t->changed) > + continue; > + t->changed = 0; > + > strbuf_reset(&ref_name); > strbuf_addf(&ref_name, "refs/tags/%s", t->name); > > @@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg) > > if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) > b->pack_id = pack_id; > + b->changed = 1; > b->last_commit = object_count_by_type[OBJ_COMMIT]; > } > > @@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg) > t->pack_id = MAX_PACK_ID; > else > t->pack_id = pack_id; > + t->changed = 1; > } > > static void parse_reset_branch(const char *arg) > @@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg) > b = new_branch(arg); > read_next_command(); > parse_from(b); > + b->changed = 1; > if (command_buf.len > 0) > unread_command_buf = 1; > } > diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh > index 3668263c4046..12abaebb22b6 100755 > --- a/t/t9300-fast-import.sh > +++ b/t/t9300-fast-import.sh > @@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' ' > ### series V (checkpoint) > ### > > -# The commands in input_file should not produce any output on the file > -# descriptor set with --cat-blob-fd (or stdout if unspecified). > +# Instructions can be sent from $input_file to background_import() loop via the > +# fast-import progress command. > +# > +# progress do: shell > +# Parse the next progress line and invoke it as a shell command. > +# > +# progress do: checkpoint and stop > +# Send checkpoint and wait for its completion. > +# > +# progress do: stop > +# Internal instruction. > # > # To make sure you're observing the side effects of checkpoint *before* > # fast-import terminates (and thus writes out its state), check that the > # fast-import process is still running using background_import_still_running > # *after* evaluating the test conditions. > -background_import_then_checkpoint () { > +background_import () { > options=$1 > input_file=$2 > > @@ -3153,22 +3162,30 @@ background_import_then_checkpoint () { > # pipes writing sequence. We want to assume that the write below could > # block, e.g. if fast-import blocks writing its own output to &9 > # because there is no reader on &9 yet. > - ( > - cat "$input_file" > - echo "checkpoint" > - echo "progress checkpoint" > - ) >&8 & > + cat "$input_file" >&8 & > > error=1 ;# assume the worst > while read output <&9 > do > - if test "$output" = "progress checkpoint" > + if test "$output" = "progress do: shell" > + then > + read output <&9 > + shell_cmd="$(echo $output |sed -e 's/^progress //')" > + $shell_cmd > + elif test "$output" = "progress do: checkpoint and stop" > + then > + ( > + echo "checkpoint" > + echo "progress do: stop" > + ) >&8 & > + elif test "$output" = "progress do: stop" > then > error=0 > break > + else > + # otherwise ignore cruft > + echo >&2 "cruft: $output" > fi > - # otherwise ignore cruft > - echo >&2 "cruft: $output" > done > > if test $error -eq 1 > @@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out > cat >input <<-INPUT_END && > progress foo > progress bar > + progress do: checkpoint and stop > > INPUT_END > > - background_import_then_checkpoint "" input && > + background_import "" input && > background_import_still_running > ' > > @@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' ' > reset refs/heads/V > from refs/heads/U > > + progress do: checkpoint and stop > + > INPUT_END > > - background_import_then_checkpoint "" input && > + background_import "" input && > test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" && > background_import_still_running > ' > @@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' ' > data 0 > from refs/heads/U > > + progress do: checkpoint and stop > + > INPUT_END > > - background_import_then_checkpoint "--export-marks=marks.actual" input && > + background_import "--export-marks=marks.actual" input && > > echo ":1 $(git rev-parse --verify V)" >marks.expected && > > @@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no > data 0 > from refs/heads/U > > + progress do: checkpoint and stop > + > INPUT_END > > - background_import_then_checkpoint "--export-marks=marks.actual" input && > + background_import "--export-marks=marks.actual" input && > > echo ":2 $(git rev-parse --verify V2)" >marks.expected && > > @@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' ' > tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > data 0 > > + progress do: checkpoint and stop > + > INPUT_END > > - background_import_then_checkpoint "" input && > + background_import "" input && > git show-ref -d Vtag && > background_import_still_running > ' > > +test_expect_success PIPE 'V: checkpoint only resets changed branches' ' > + cat >input <<-INPUT_END && > + reset refs/heads/V3 > + from refs/heads/U > + > + checkpoint > + > + progress do: shell > + progress git branch -f V3 V > + > + reset refs/heads/V4 > + from refs/heads/U > + > + progress do: checkpoint and stop > + > + INPUT_END > + > + background_import "--force" input && > + test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" && > + background_import_still_running > +' > + > +test_expect_success PIPE 'V: checkpoint only updates changed tags' ' > + cat >input <<-INPUT_END && > + tag Vtag2 > + from refs/heads/U > + tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > + data 0 > + > + checkpoint > + > + progress do: shell > + progress git tag -f Vtag2 V > + > + tag Vtag3 > + from refs/heads/U > + tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > + data 0 > + > + progress do: checkpoint and stop > + > + INPUT_END > + > + background_import "" input && > + test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" && > + background_import_still_running > +' > + > ### > ### series W (get-mark and empty orphan commits) > ### ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them 2019-05-15 5:30 ` Junio C Hamano @ 2019-05-15 6:57 ` Eric Rannaud 2019-05-16 0:35 ` Elijah Newren 1 sibling, 0 replies; 5+ messages in thread From: Eric Rannaud @ 2019-05-15 6:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Elijah Newren, git, Jeremy Serror On Tue, May 14, 2019 at 10:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Eric Rannaud" <e@nanocritical.com> writes: > > > We now keep track of whether branches and tags have been changed by this > > fast-import process since our last checkpoint (or startup). At the next > > checkpoint, only refs and tags that new commands have changed are > > written to disk. > > And when we notice that we have been competing with some other > process and the ref that we wanted to update has already been > updated to a different value, what happens? That should be part of > the description of the new behaviour (aka "fix") written here. This isn't addressed by this change, that behavior remains the same as before. This change does not introduce a new failure mode. That is, we still overwrite external changes if fast-import has itself been instructed to change the ref since the last checkpoint (but without --force it must be a fast forward wrt the current state on disk). With this change applied: 1> $ git fast-import --force 1> reset refs/heads/master 1> from $A 1> checkpoint 2> $ git branch -f refs/heads/master $B 1> reset refs/heads/master 1> from $C 1> checkpoint master points to $C. I can update the commit message with more details. We could provide more complete transactional semantics, I suppose, i.e. only update a ref if: - (1) fast-import has been instructed to change it (changed == 1) since the last checkpoint (this patch), - and (2) the ref on disk is identical to the one fast-import wrote at the previous checkpoint (or is identical to the one loaded from disk if we never modified it). Would we also need a command to get fast-import to read some/all refs from disk and update its internal state? But maybe this is going too far. In any case, the sequence above would then fail. But how should we fail? The same way we do for checkpoints which would result in non-FF when --force is not supplied? That is: don't update this ref, print a warning, update other refs, continue processing the input, then eventually exit with 1. One problem with the current warning is that it goes to stderr and there is no reliable way to detect on --cat-blob-fd that a checkpoint failed. It is typical to use things like: checkpoint progress checkpoint and wait to read "progress checkpoint" on --cat-blob-fd to know when a checkpoint is complete. It works fine if checkpoints cannot fail. For a long-running fast-import process, an error would not necessarily be noticed until EOF or "done", and you wouldn't easily be able to tell which checkpoint actually failed. If we want to improve things beyond what this patch does, I think that a reasonable balance would be: - (a) improve the existing FF-only transaction by providing an immediate error on --cat-blob-fd if a checkpoint fails (the error should specify the list of refs that were not updated); for simplicity of processing the output of fast-import, we likely want to also print something for successful checkpoints: "git fast-import --checkpoint-success", and checkpoint prints on --cat-blob-fd: [checkpoint failed <ref> LF]* checkpoint complete LF - (b) add a --force option to the "checkpoint" command, bypassing the FF-only check on demand so that an interactive program can retry a checkpoint after reading the state on disk, and maybe eventually give up and force the update. > > --- > > Missing sign-off. Thanks. I will resend with a better explanation and sign-off. On Tue, May 14, 2019 at 10:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Eric Rannaud" <e@nanocritical.com> writes: > > > We now keep track of whether branches and tags have been changed by this > > fast-import process since our last checkpoint (or startup). At the next > > checkpoint, only refs and tags that new commands have changed are > > written to disk. > > And when we notice that we have been competing with some other > process and the ref that we wanted to update has already been > updated to a different value, what happens? That should be part of > the description of the new behaviour (aka "fix") written here. > > > --- > > Missing sign-off. > > It sounds like a worthwhile thing to do. > > It seems that Elijah has been active in fast-import/export area in > the last 12 months, so let's borrow a second set of eyes from him. > > Thanks. > > > fast-import.c | 14 ++++++ > > t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++------- > > 2 files changed, 104 insertions(+), 16 deletions(-) > > > > > > v2 and v1 were sent in Oct 2018. Only difference since is that the new > > test cases in t9300 don't use sleep to implement the asynchronous dance. > > > > Thanks. > > > > > > diff --git a/fast-import.c b/fast-import.c > > index f38d04fa5851..e9c3ea23ae42 100644 > > --- a/fast-import.c > > +++ b/fast-import.c > > @@ -103,6 +103,7 @@ struct branch { > > uintmax_t num_notes; > > unsigned active : 1; > > unsigned delete : 1; > > + unsigned changed : 1; > > unsigned pack_id : PACK_ID_BITS; > > struct object_id oid; > > }; > > @@ -110,6 +111,7 @@ struct branch { > > struct tag { > > struct tag *next_tag; > > const char *name; > > + unsigned changed : 1; > > unsigned int pack_id; > > struct object_id oid; > > }; > > @@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name) > > b->branch_tree.versions[1].mode = S_IFDIR; > > b->num_notes = 0; > > b->active = 0; > > + b->changed = 0; > > b->pack_id = MAX_PACK_ID; > > branch_table[hc] = b; > > branch_count++; > > @@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b) > > struct object_id old_oid; > > struct strbuf err = STRBUF_INIT; > > > > + if (!b->changed) > > + return 0; > > + b->changed = 0; > > + > > if (is_null_oid(&b->oid)) { > > if (b->delete) > > delete_ref(NULL, b->name, NULL, 0); > > @@ -1636,6 +1643,10 @@ static void dump_tags(void) > > goto cleanup; > > } > > for (t = first_tag; t; t = t->next_tag) { > > + if (!t->changed) > > + continue; > > + t->changed = 0; > > + > > strbuf_reset(&ref_name); > > strbuf_addf(&ref_name, "refs/tags/%s", t->name); > > > > @@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg) > > > > if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) > > b->pack_id = pack_id; > > + b->changed = 1; > > b->last_commit = object_count_by_type[OBJ_COMMIT]; > > } > > > > @@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg) > > t->pack_id = MAX_PACK_ID; > > else > > t->pack_id = pack_id; > > + t->changed = 1; > > } > > > > static void parse_reset_branch(const char *arg) > > @@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg) > > b = new_branch(arg); > > read_next_command(); > > parse_from(b); > > + b->changed = 1; > > if (command_buf.len > 0) > > unread_command_buf = 1; > > } > > diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh > > index 3668263c4046..12abaebb22b6 100755 > > --- a/t/t9300-fast-import.sh > > +++ b/t/t9300-fast-import.sh > > @@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' ' > > ### series V (checkpoint) > > ### > > > > -# The commands in input_file should not produce any output on the file > > -# descriptor set with --cat-blob-fd (or stdout if unspecified). > > +# Instructions can be sent from $input_file to background_import() loop via the > > +# fast-import progress command. > > +# > > +# progress do: shell > > +# Parse the next progress line and invoke it as a shell command. > > +# > > +# progress do: checkpoint and stop > > +# Send checkpoint and wait for its completion. > > +# > > +# progress do: stop > > +# Internal instruction. > > # > > # To make sure you're observing the side effects of checkpoint *before* > > # fast-import terminates (and thus writes out its state), check that the > > # fast-import process is still running using background_import_still_running > > # *after* evaluating the test conditions. > > -background_import_then_checkpoint () { > > +background_import () { > > options=$1 > > input_file=$2 > > > > @@ -3153,22 +3162,30 @@ background_import_then_checkpoint () { > > # pipes writing sequence. We want to assume that the write below could > > # block, e.g. if fast-import blocks writing its own output to &9 > > # because there is no reader on &9 yet. > > - ( > > - cat "$input_file" > > - echo "checkpoint" > > - echo "progress checkpoint" > > - ) >&8 & > > + cat "$input_file" >&8 & > > > > error=1 ;# assume the worst > > while read output <&9 > > do > > - if test "$output" = "progress checkpoint" > > + if test "$output" = "progress do: shell" > > + then > > + read output <&9 > > + shell_cmd="$(echo $output |sed -e 's/^progress //')" > > + $shell_cmd > > + elif test "$output" = "progress do: checkpoint and stop" > > + then > > + ( > > + echo "checkpoint" > > + echo "progress do: stop" > > + ) >&8 & > > + elif test "$output" = "progress do: stop" > > then > > error=0 > > break > > + else > > + # otherwise ignore cruft > > + echo >&2 "cruft: $output" > > fi > > - # otherwise ignore cruft > > - echo >&2 "cruft: $output" > > done > > > > if test $error -eq 1 > > @@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out > > cat >input <<-INPUT_END && > > progress foo > > progress bar > > + progress do: checkpoint and stop > > > > INPUT_END > > > > - background_import_then_checkpoint "" input && > > + background_import "" input && > > background_import_still_running > > ' > > > > @@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' ' > > reset refs/heads/V > > from refs/heads/U > > > > + progress do: checkpoint and stop > > + > > INPUT_END > > > > - background_import_then_checkpoint "" input && > > + background_import "" input && > > test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" && > > background_import_still_running > > ' > > @@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' ' > > data 0 > > from refs/heads/U > > > > + progress do: checkpoint and stop > > + > > INPUT_END > > > > - background_import_then_checkpoint "--export-marks=marks.actual" input && > > + background_import "--export-marks=marks.actual" input && > > > > echo ":1 $(git rev-parse --verify V)" >marks.expected && > > > > @@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no > > data 0 > > from refs/heads/U > > > > + progress do: checkpoint and stop > > + > > INPUT_END > > > > - background_import_then_checkpoint "--export-marks=marks.actual" input && > > + background_import "--export-marks=marks.actual" input && > > > > echo ":2 $(git rev-parse --verify V2)" >marks.expected && > > > > @@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' ' > > tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > > data 0 > > > > + progress do: checkpoint and stop > > + > > INPUT_END > > > > - background_import_then_checkpoint "" input && > > + background_import "" input && > > git show-ref -d Vtag && > > background_import_still_running > > ' > > > > +test_expect_success PIPE 'V: checkpoint only resets changed branches' ' > > + cat >input <<-INPUT_END && > > + reset refs/heads/V3 > > + from refs/heads/U > > + > > + checkpoint > > + > > + progress do: shell > > + progress git branch -f V3 V > > + > > + reset refs/heads/V4 > > + from refs/heads/U > > + > > + progress do: checkpoint and stop > > + > > + INPUT_END > > + > > + background_import "--force" input && > > + test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" && > > + background_import_still_running > > +' > > + > > +test_expect_success PIPE 'V: checkpoint only updates changed tags' ' > > + cat >input <<-INPUT_END && > > + tag Vtag2 > > + from refs/heads/U > > + tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > > + data 0 > > + > > + checkpoint > > + > > + progress do: shell > > + progress git tag -f Vtag2 V > > + > > + tag Vtag3 > > + from refs/heads/U > > + tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE > > + data 0 > > + > > + progress do: checkpoint and stop > > + > > + INPUT_END > > + > > + background_import "" input && > > + test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" && > > + background_import_still_running > > +' > > + > > ### > > ### series W (get-mark and empty orphan commits) > > ### -- Eric Rannaud <e@nanocritical.com> Nanocritical, CEO ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them 2019-05-15 5:30 ` Junio C Hamano 2019-05-15 6:57 ` Eric Rannaud @ 2019-05-16 0:35 ` Elijah Newren 2019-05-16 5:54 ` Eric Rannaud 1 sibling, 1 reply; 5+ messages in thread From: Elijah Newren @ 2019-05-16 0:35 UTC (permalink / raw) To: Eric Rannaud; +Cc: Git Mailing List, Jeremy Serror, Junio C Hamano On Tue, May 14, 2019 at 10:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Eric Rannaud" <e@nanocritical.com> writes: > > > We now keep track of whether branches and tags have been changed by this > > fast-import process since our last checkpoint (or startup). At the next > > checkpoint, only refs and tags that new commands have changed are > > written to disk. > > And when we notice that we have been competing with some other > process and the ref that we wanted to update has already been > updated to a different value, what happens? That should be part of > the description of the new behaviour (aka "fix") written here. > > > --- > > Missing sign-off. > > It sounds like a worthwhile thing to do. > > It seems that Elijah has been active in fast-import/export area in > the last 12 months, so let's borrow a second set of eyes from him. I glanced over the fast-import.c portion of the code and it seemed sane, modulo this question about handling of failure to update branches/tags during a checkpoint. Didn't read the testcase portion of the patch as closely yet, but I think the big question is failures during checkpoints. First, I have to admit to having never used checkpoints, and I'm struggling a little bit to understand the usecase here. For context on my angle, I've been always using --force and 'feature done', which ensures that either all branches and tags successfully update, or fast-import terminates with having made no changes (other than perhaps having written a garbage packfile that can be pruned). I'm particularly worried about the frontend hitting errors and not completing its output. With that in mind, I was surprised to notice that explicit checkpoints update branches and tags; it seemed incompatible with --done to me, though it turns out to only be incompatible with my intended use (no partial updates). In your case, you are clearly interested in partial updates for some purpose. It's easiest to figure out the 'right' behavior for partial updates in conjunction with --force: as you state, if someone else updates some ref after a checkpoint and the input stream continues to change the ref, then tough luck for the external party; we were told to `--force` so we overwrite. But the case without --force is most interesting, as Junio highlighted. I don't think that case has been thought about deeply in the past, for two reasons: (1) the fast-import documentation is somewhat dismissive of checkpoints ("given that a 30 GiB Subversion repository can be loaded into Git through fast-import in about 3 hours, explicit checkpointing may not be necessary") , (2) I think most folks assume the repository isn't going to be interacted with during the fast-import process. However, the whole reason for your patch is because checkpoints are in use *and* people are updating the repo during the fast-import process, so this area that was likely overlooked in the past now is pretty important to get right. Looking at the fast-import code, it appears that update_branch and update_tag both set a global 'failure' flag whenever a branch cannot be updated due to it not being a fast-forward. It prints a "warning" but this is likely because it wants to update whatever branches and tags it can and report issues on any it can't, and then after that's done then exit. Since branches and tags are typically just updated at the very end of the process (again, based on the documentation suggesting checkpoints aren't necessary), the end of program exit can also be thought of as the "exit immediately after failing branch/tag updates". My current suspicion is that the current checkpointing code not checking the 'failure' flag was an oversight. If the user doesn't want branches/tags forcibly updated, and they do want checkpoints when the branches/tags get updated as input comes in, and there is the possibility that external folks modify thing, then I think it makes sense to abort the import once the checkpoint is finished (i.e. update whatever branches/tags we can and then error out); continuing to process the input stream seems wrong, because: * the reason to use checkpoints is because the fast-import process is taking a long time; if it was quick, there'd be no reason to manually checkpoint. Therefore... * the user may be forced to backup and redo the long-lived fast-import process anyway; we've cost them time by not erroring out immediately * we allow errors to accumulate (other branches could have failures to update later in the fast-import process), causing users to have to figure out whether one or many failures were reported and then have to try to track down if they have one or multiple causes * the users are less likely to notice early "warning" without erroring out, but the cause (someone else updating a ref) may be harder to diagnose the longer it has been since the external ref update. That's my impression looking over your change and reading over the relevant bits of fast-import.c, but as I noted before I don't understand the usecase and perhaps there's something important I'm not grasping based on that. It'd be great if you could provide more details to help us determine if we need to update the checkpoint code to appropriately handle errors. Hope that helps, Elijah ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them 2019-05-16 0:35 ` Elijah Newren @ 2019-05-16 5:54 ` Eric Rannaud 0 siblings, 0 replies; 5+ messages in thread From: Eric Rannaud @ 2019-05-16 5:54 UTC (permalink / raw) To: Elijah Newren; +Cc: Git Mailing List, Jeremy Serror, Junio C Hamano On Wed, May 15, 2019 at 5:36 PM Elijah Newren <newren@gmail.com> wrote: > with during the fast-import process. However, the whole reason for > your patch is because checkpoints are in use *and* people are updating > the repo during the fast-import process, so this area that was likely > overlooked in the past now is pretty important to get right. A little bit of background on our particular use case: fast-import is used to perform frequent history (re)writing and is a great (and fast) way to modify many refs without having to touch a checkout. For us, an alternative would be to use libgit but this is simpler, multiprocess and non-blocking, and doesn't have any additional dependencies. We use --force: our fast-import frontend independently watches the repo for updates so we are generally aware of external updates to the repo. (And if we race with an external change, we can live with that -- although being able to finely control which ref updates we want to force and which ones we expect to be FF would be great, but that's for another patch.) We would like to use checkpoints to keep fast-import running, which has the advantage of keeping some Git state in memory. Restarting git fast-import every time we want to make a change is doable, but more expensive. With the current behavior, we have to shut down fast-import after every checkpoint (in other words, we can't really use them) and restart when needed, ensuring we don't overwrite refs we didn't intend to change this time. Alternatively, we could maintain an up-to-date (wrt .git/ on disk) mapping ref -> sha1 for all refs we've ever changed, and make sure to precede any checkpoint with a bunch of intended-to-be-no-op reset commands, which effectively let fast-import know about external changes to refs that we do not wish to overwrite. But this patch seemed like a better path forward. > Looking at the fast-import code, it appears that update_branch and > update_tag both set a global 'failure' flag whenever a branch cannot > be updated due to it not being a fast-forward. It prints a "warning" > but this is likely because it wants to update whatever branches and > tags it can and report issues on any it can't, and then after that's > done then exit. Since branches and tags are typically just updated at > the very end of the process (again, based on the documentation > suggesting checkpoints aren't necessary), the end of program exit can > also be thought of as the "exit immediately after failing branch/tag > updates". My current suspicion is that the current checkpointing code > not checking the 'failure' flag was an oversight. If the user doesn't > want branches/tags forcibly updated, and they do want checkpoints when > the branches/tags get updated as input comes in, and there is the > possibility that external folks modify thing, then I think it makes > sense to abort the import once the checkpoint is finished (i.e. update > whatever branches/tags we can and then error out); continuing to > process the input stream seems wrong, because: > * the reason to use checkpoints is because the fast-import process > is taking a long time; if it was quick, there'd be no reason to > manually checkpoint. Therefore... > * the user may be forced to backup and redo the long-lived > fast-import process anyway; we've cost them time by not erroring out > immediately > * we allow errors to accumulate (other branches could have failures > to update later in the fast-import process), causing users to have to > figure out whether one or many failures were reported and then have to > try to track down if they have one or multiple causes > * the users are less likely to notice early "warning" without > erroring out, but the cause (someone else updating a ref) may be > harder to diagnose the longer it has been since the external ref > update. We use --force so I don't have an immediate interest in how checkpoint errors are handled. In a separate response to Junio, I suggested an alternative mechanism to signal checkpoint conflicts on --cat-blob-fd, but I think it would generally be fine if fast-import were to simply exit on such an error. To reliably figure out which updates made it to disk and which didn't, one can obtain expected OIDs with get-mark before initiating a checkpoint, and compare with refs on disk after an error. I'm not sure if such a change should be behind a flag or not. I think anyone running without --force does care if a ref update is rejected; and we're unlikely to break much by erroring out early. But maybe some users out there don't check the exit status of fast-import and haven't noticed that some branch they don't really care about is not being updated successfully while everything else gets updated fine? Thanks for the thoughtful analysis. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-05-16 5:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-10 8:41 [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them Eric Rannaud 2019-05-15 5:30 ` Junio C Hamano 2019-05-15 6:57 ` Eric Rannaud 2019-05-16 0:35 ` Elijah Newren 2019-05-16 5:54 ` Eric Rannaud
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).