git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Rannaud <e@nanocritical.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Elijah Newren <newren@gmail.com>,
	git@vger.kernel.org, Jeremy Serror <jeremy.serror@gmail.com>
Subject: Re: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them
Date: Tue, 14 May 2019 23:57:40 -0700	[thread overview]
Message-ID: <CAH_=xoZfx3aLpc+HZQ0FLZnRTkYW1wW7XN0huuOsT9bRJKzOKg@mail.gmail.com> (raw)
In-Reply-To: <xmqq8sv8709a.fsf@gitster-ct.c.googlers.com>

On Tue, May 14, 2019 at 10:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Eric Rannaud" <e@nanocritical.com> writes:
>
> > We now keep track of whether branches and tags have been changed by this
> > fast-import process since our last checkpoint (or startup). At the next
> > checkpoint, only refs and tags that new commands have changed are
> > written to disk.
>
> And when we notice that we have been competing with some other
> process and the ref that we wanted to update has already been
> updated to a different value, what happens?  That should be part of
> the description of the new behaviour (aka "fix") written here.

This isn't addressed by this change, that behavior remains the same as
before. This change does not introduce a new failure mode. That is, we
still overwrite external changes if fast-import has itself been
instructed to change the ref since the last checkpoint (but without
--force it must be a fast forward wrt the current state on disk).

With this change applied:

        1> $ git fast-import --force
        1> reset refs/heads/master
        1> from $A
        1> checkpoint
        2> $ git branch -f refs/heads/master $B
        1> reset refs/heads/master
        1> from $C
        1> checkpoint

master points to $C.

I can update the commit message with more details.

We could provide more complete transactional semantics, I suppose,
i.e. only update a ref if:
    - (1) fast-import has been instructed to change it (changed == 1)
since the last checkpoint (this patch),
    - and (2) the ref on disk is identical to the one fast-import
wrote at the previous checkpoint (or is identical to the one loaded
from disk if we never modified it).
Would we also need a command to get fast-import to read some/all refs
from disk and update its internal state? But maybe this is going too
far.

In any case, the sequence above would then fail. But how should we
fail? The same way we do for checkpoints which would result in non-FF
when --force is not supplied? That is: don't update this ref, print a
warning, update other refs, continue processing the input, then
eventually exit with 1.

One problem with the current warning is that it goes to stderr and
there is no reliable way to detect on --cat-blob-fd that a checkpoint
failed. It is typical to use things like:

        checkpoint
        progress checkpoint

and wait to read "progress checkpoint" on --cat-blob-fd to know when a
checkpoint is complete. It works fine if checkpoints cannot fail. For
a long-running fast-import process, an error would not necessarily be
noticed until EOF or "done", and you wouldn't easily be able to tell
which checkpoint actually failed.

If we want to improve things beyond what this patch does, I think that
a reasonable balance would be:

    - (a) improve the existing FF-only transaction by providing an
immediate error on --cat-blob-fd if a checkpoint fails (the error
should specify the list of refs that were not updated); for simplicity
of processing the output of fast-import, we likely want to also print
something for successful checkpoints: "git fast-import
--checkpoint-success", and checkpoint prints on --cat-blob-fd:

        [checkpoint failed <ref> LF]*
        checkpoint complete LF

    - (b) add a --force option to the "checkpoint" command, bypassing
the FF-only check on demand so that an interactive program can retry a
checkpoint after reading the state on disk, and maybe eventually give
up and force the update.


> > ---
>
> Missing sign-off.


Thanks. I will resend with a better explanation and sign-off.


On Tue, May 14, 2019 at 10:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Eric Rannaud" <e@nanocritical.com> writes:
>
> > We now keep track of whether branches and tags have been changed by this
> > fast-import process since our last checkpoint (or startup). At the next
> > checkpoint, only refs and tags that new commands have changed are
> > written to disk.
>
> And when we notice that we have been competing with some other
> process and the ref that we wanted to update has already been
> updated to a different value, what happens?  That should be part of
> the description of the new behaviour (aka "fix") written here.
>
> > ---
>
> Missing sign-off.
>
> It sounds like a worthwhile thing to do.
>
> It seems that Elijah has been active in fast-import/export area in
> the last 12 months, so let's borrow a second set of eyes from him.
>
> Thanks.
>
> >  fast-import.c          |  14 ++++++
> >  t/t9300-fast-import.sh | 106 ++++++++++++++++++++++++++++++++++-------
> >  2 files changed, 104 insertions(+), 16 deletions(-)
> >
> >
> > v2 and v1 were sent in Oct 2018. Only difference since is that the new
> > test cases in t9300 don't use sleep to implement the asynchronous dance.
> >
> > Thanks.
> >
> >
> > diff --git a/fast-import.c b/fast-import.c
> > index f38d04fa5851..e9c3ea23ae42 100644
> > --- a/fast-import.c
> > +++ b/fast-import.c
> > @@ -103,6 +103,7 @@ struct branch {
> >       uintmax_t num_notes;
> >       unsigned active : 1;
> >       unsigned delete : 1;
> > +     unsigned changed : 1;
> >       unsigned pack_id : PACK_ID_BITS;
> >       struct object_id oid;
> >  };
> > @@ -110,6 +111,7 @@ struct branch {
> >  struct tag {
> >       struct tag *next_tag;
> >       const char *name;
> > +     unsigned changed : 1;
> >       unsigned int pack_id;
> >       struct object_id oid;
> >  };
> > @@ -581,6 +583,7 @@ static struct branch *new_branch(const char *name)
> >       b->branch_tree.versions[1].mode = S_IFDIR;
> >       b->num_notes = 0;
> >       b->active = 0;
> > +     b->changed = 0;
> >       b->pack_id = MAX_PACK_ID;
> >       branch_table[hc] = b;
> >       branch_count++;
> > @@ -1571,6 +1574,10 @@ static int update_branch(struct branch *b)
> >       struct object_id old_oid;
> >       struct strbuf err = STRBUF_INIT;
> >
> > +     if (!b->changed)
> > +             return 0;
> > +     b->changed = 0;
> > +
> >       if (is_null_oid(&b->oid)) {
> >               if (b->delete)
> >                       delete_ref(NULL, b->name, NULL, 0);
> > @@ -1636,6 +1643,10 @@ static void dump_tags(void)
> >               goto cleanup;
> >       }
> >       for (t = first_tag; t; t = t->next_tag) {
> > +             if (!t->changed)
> > +                     continue;
> > +             t->changed = 0;
> > +
> >               strbuf_reset(&ref_name);
> >               strbuf_addf(&ref_name, "refs/tags/%s", t->name);
> >
> > @@ -2679,6 +2690,7 @@ static void parse_new_commit(const char *arg)
> >
> >       if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark))
> >               b->pack_id = pack_id;
> > +     b->changed = 1;
> >       b->last_commit = object_count_by_type[OBJ_COMMIT];
> >  }
> >
> > @@ -2763,6 +2775,7 @@ static void parse_new_tag(const char *arg)
> >               t->pack_id = MAX_PACK_ID;
> >       else
> >               t->pack_id = pack_id;
> > +     t->changed = 1;
> >  }
> >
> >  static void parse_reset_branch(const char *arg)
> > @@ -2783,6 +2796,7 @@ static void parse_reset_branch(const char *arg)
> >               b = new_branch(arg);
> >       read_next_command();
> >       parse_from(b);
> > +     b->changed = 1;
> >       if (command_buf.len > 0)
> >               unread_command_buf = 1;
> >  }
> > diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> > index 3668263c4046..12abaebb22b6 100755
> > --- a/t/t9300-fast-import.sh
> > +++ b/t/t9300-fast-import.sh
> > @@ -3121,14 +3121,23 @@ test_expect_success 'U: validate root delete result' '
> >  ### series V (checkpoint)
> >  ###
> >
> > -# The commands in input_file should not produce any output on the file
> > -# descriptor set with --cat-blob-fd (or stdout if unspecified).
> > +# Instructions can be sent from $input_file to background_import() loop via the
> > +# fast-import progress command.
> > +#
> > +#    progress do: shell
> > +# Parse the next progress line and invoke it as a shell command.
> > +#
> > +#    progress do: checkpoint and stop
> > +# Send checkpoint and wait for its completion.
> > +#
> > +#    progress do: stop
> > +# Internal instruction.
> >  #
> >  # To make sure you're observing the side effects of checkpoint *before*
> >  # fast-import terminates (and thus writes out its state), check that the
> >  # fast-import process is still running using background_import_still_running
> >  # *after* evaluating the test conditions.
> > -background_import_then_checkpoint () {
> > +background_import () {
> >       options=$1
> >       input_file=$2
> >
> > @@ -3153,22 +3162,30 @@ background_import_then_checkpoint () {
> >       # pipes writing sequence. We want to assume that the write below could
> >       # block, e.g. if fast-import blocks writing its own output to &9
> >       # because there is no reader on &9 yet.
> > -     (
> > -             cat "$input_file"
> > -             echo "checkpoint"
> > -             echo "progress checkpoint"
> > -     ) >&8 &
> > +     cat "$input_file" >&8 &
> >
> >       error=1 ;# assume the worst
> >       while read output <&9
> >       do
> > -             if test "$output" = "progress checkpoint"
> > +             if test "$output" = "progress do: shell"
> > +             then
> > +                     read output <&9
> > +                     shell_cmd="$(echo $output |sed -e 's/^progress //')"
> > +                     $shell_cmd
> > +             elif test "$output" = "progress do: checkpoint and stop"
> > +             then
> > +                     (
> > +                             echo "checkpoint"
> > +                             echo "progress do: stop"
> > +                     ) >&8 &
> > +             elif test "$output" = "progress do: stop"
> >               then
> >                       error=0
> >                       break
> > +             else
> > +                     # otherwise ignore cruft
> > +                     echo >&2 "cruft: $output"
> >               fi
> > -             # otherwise ignore cruft
> > -             echo >&2 "cruft: $output"
> >       done
> >
> >       if test $error -eq 1
> > @@ -3189,10 +3206,11 @@ test_expect_success PIPE 'V: checkpoint helper does not get stuck with extra out
> >       cat >input <<-INPUT_END &&
> >       progress foo
> >       progress bar
> > +     progress do: checkpoint and stop
> >
> >       INPUT_END
> >
> > -     background_import_then_checkpoint "" input &&
> > +     background_import "" input &&
> >       background_import_still_running
> >  '
> >
> > @@ -3201,9 +3219,11 @@ test_expect_success PIPE 'V: checkpoint updates refs after reset' '
> >       reset refs/heads/V
> >       from refs/heads/U
> >
> > +     progress do: checkpoint and stop
> > +
> >       INPUT_END
> >
> > -     background_import_then_checkpoint "" input &&
> > +     background_import "" input &&
> >       test "$(git rev-parse --verify V)" = "$(git rev-parse --verify U)" &&
> >       background_import_still_running
> >  '
> > @@ -3216,9 +3236,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit' '
> >       data 0
> >       from refs/heads/U
> >
> > +     progress do: checkpoint and stop
> > +
> >       INPUT_END
> >
> > -     background_import_then_checkpoint "--export-marks=marks.actual" input &&
> > +     background_import "--export-marks=marks.actual" input &&
> >
> >       echo ":1 $(git rev-parse --verify V)" >marks.expected &&
> >
> > @@ -3237,9 +3259,11 @@ test_expect_success PIPE 'V: checkpoint updates refs and marks after commit (no
> >       data 0
> >       from refs/heads/U
> >
> > +     progress do: checkpoint and stop
> > +
> >       INPUT_END
> >
> > -     background_import_then_checkpoint "--export-marks=marks.actual" input &&
> > +     background_import "--export-marks=marks.actual" input &&
> >
> >       echo ":2 $(git rev-parse --verify V2)" >marks.expected &&
> >
> > @@ -3255,13 +3279,63 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
> >       tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> >       data 0
> >
> > +     progress do: checkpoint and stop
> > +
> >       INPUT_END
> >
> > -     background_import_then_checkpoint "" input &&
> > +     background_import "" input &&
> >       git show-ref -d Vtag &&
> >       background_import_still_running
> >  '
> >
> > +test_expect_success PIPE 'V: checkpoint only resets changed branches' '
> > +     cat >input <<-INPUT_END &&
> > +     reset refs/heads/V3
> > +     from refs/heads/U
> > +
> > +     checkpoint
> > +
> > +     progress do: shell
> > +     progress git branch -f V3 V
> > +
> > +     reset refs/heads/V4
> > +     from refs/heads/U
> > +
> > +     progress do: checkpoint and stop
> > +
> > +     INPUT_END
> > +
> > +     background_import "--force" input &&
> > +     test "$(git rev-parse --verify V3)" = "$(git rev-parse --verify V)" &&
> > +     background_import_still_running
> > +'
> > +
> > +test_expect_success PIPE 'V: checkpoint only updates changed tags' '
> > +     cat >input <<-INPUT_END &&
> > +     tag Vtag2
> > +     from refs/heads/U
> > +     tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> > +     data 0
> > +
> > +     checkpoint
> > +
> > +     progress do: shell
> > +     progress git tag -f Vtag2 V
> > +
> > +     tag Vtag3
> > +     from refs/heads/U
> > +     tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> > +     data 0
> > +
> > +     progress do: checkpoint and stop
> > +
> > +     INPUT_END
> > +
> > +     background_import "" input &&
> > +     test "$(git show-ref -d Vtag2 |tail -1 |awk \{print\ \$1\})" = "$(git rev-parse --verify V)" &&
> > +     background_import_still_running
> > +'
> > +
> >  ###
> >  ### series W (get-mark and empty orphan commits)
> >  ###



-- 
Eric Rannaud <e@nanocritical.com>
Nanocritical, CEO

  reply	other threads:[~2019-05-15  6:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10  8:41 [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them Eric Rannaud
2019-05-15  5:30 ` Junio C Hamano
2019-05-15  6:57   ` Eric Rannaud [this message]
2019-05-16  0:35   ` Elijah Newren
2019-05-16  5:54     ` Eric Rannaud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH_=xoZfx3aLpc+HZQ0FLZnRTkYW1wW7XN0huuOsT9bRJKzOKg@mail.gmail.com' \
    --to=e@nanocritical.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeremy.serror@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).