git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Rannaud <e@nanocritical.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Jeremy Serror <jeremy.serror@gmail.com>
Subject: Re: [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them
Date: Mon, 27 May 2019 15:45:49 -0700	[thread overview]
Message-ID: <CAH_=xoZ5Lt2nn50cfDbTA-ZVeshi8HZ91Od4btuUXpBGCanGng@mail.gmail.com> (raw)
In-Reply-To: <20190527211245.12292-1-newren@gmail.com>

On Mon, May 27, 2019 at 2:12 PM Elijah Newren <newren@gmail.com> wrote:
> On Fri, May 24, 2019 at 4:10 PM Eric Rannaud <e@nanocritical.com> wrote:
> >
> > With this patch, we keep track of whether a particular branch or tag has
> > been changed by this fast-import process since our last checkpoint (or
> > startup). At the next checkpoint, only refs and tags that new commands
> > have changed are written to disk. To be clear, fast-import still does
> > not update its internal view of branches in response to external
> > changes, but now it avoids interfering with external changes unless
> > there was an explicit command to do so since the last checkpoint.
>
> This patch looks reasonable to me; you can view it one of two basic ways:
>   1) an optimization (only update what needs to be updated)
>   2) a way to have early access to a completed branch
>
> In particular, if the fast-import stream is dealing with unrelated (or
> very divergent) branches and it completes one branch before others,
> then your change allows people to have early access to both read &
> update the completed branches while the import continues working on
> other branches.

That's a good way to describe how people with a more standard use case
than mine can benefit from my patch.


> If you changed your description to sell it this way, it'd be fine
> as-is.  But you use --force and make multiple mentions of concurrent
> updates to branches in external processes during the fast-import
> process.  That kind of description makes it really clear we need to
> tighten up what happens with the checkpoint command when it hits a
> failure (as mentioned in the commentary on V3).  Below is a patch that
> does this.
>
> I think we should either:
>
>   1) update your commit message to sell it without mentioning the
>      concurrent rewrites, and then I'll update my patch to not
>      conflict (we both add new tests at the same location to the same
>      file causing a simple conflict) by building on yours, OR
>
>   2) update your patch to come after mine and add a comment to your
>      commit message about how checkpoint will abort if it hits an
>      error, suggesting that people should only update branches
>      fast-import will not be updating further or they should use
>      --force and deal with their changes being overwritten by
>      fast-import.  Then you can submit our two patches as a series.
>
> Thoughts?

(2) is probably the most logical order to do things. I have a few
comments on your patch, below.



> -- 8< --
> Subject: [PATCH] fast-import: remember to check failure flag when
>  checkpointing
>
> fast-import, when finished, will flush out the current packfile and
> update branches, tags, and marks, returning whether it was successful.
> The point of the 'checkpoint' command was to do all the same things, but
> continue processing the stream of input commands if it was successful.
> Unfortunately, the checkpoint code forgot to check the 'failure' flag to
> see if there was an error in e.g. updating the branches, which meant it
> would also continue if there was a failure.  Fix this oversight.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  fast-import.c          |  2 ++
>  t/t9300-fast-import.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
>
> diff --git a/fast-import.c b/fast-import.c
> index f38d04fa58..d0e12b03a0 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -3033,6 +3033,8 @@ static void checkpoint(void)
>         dump_branches();
>         dump_tags();
>         dump_marks();
> +       if (failure)
> +               exit(1);
>  }
>
>  static void parse_checkpoint(void)
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 3668263c40..788a543b82 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -3262,6 +3262,48 @@ test_expect_success PIPE 'V: checkpoint updates tags after tag' '
>         background_import_still_running
>  '
>
> +background_fast_import_race () {
> +       options=$1
> +       input_file=$2
> +       exit_status=$3
> +       extra_cmd="$4"
> +
> +       mkfifo V.input
> +       exec 8<>V.input
> +       rm V.input
> +
> +       git fast-import $options <&8 &
> +       echo $! >V.pid &&
> +       $extra_cmd &&
> +
> +       cat $input_file >&8
> +       wait $(cat V.pid)
> +       test $? -eq $exit_status
> +}

I think the test will not fail without the patch (and therefore won't
catch a regression): while checkpoint doesn't currently check the
failure flag, it will be checked when cmd_main() returns so
exit_status will be 1.

You either want to:
- (a) add more independent commands after the checkpoint and check
that they were not run,
- or (b) run with --done, do not include a done command, and check
that fast-import does exit (but it's racy),
- or (c) you can reuse background_import to have a more explicit
sequence of events (in which case improvements to background_import
from my patch would have to be committed first).

Something like: (I did not try the following code)

cat >input <<-INPUT_END &&
commit refs/heads/V4
mark :1
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
data 0
from refs/heads/U

commit refs/heads/V4_child
mark :1
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
data 0
from refs/heads/V4^0

checkpoint

reset refs/heads/V5
from refs/heads/U

progress do: shell
progress git branch V5 V4

checkpoint

reset refs/heads/V5
from refs/heads/V4_child

progress do: checkpoint and stop

INPUT_END

background_import "" input &&

# V5 remains equal to V4 as set by the external "git branch" command:
#
# V5 is not updated by the second checkpoint (as resetting V5 to U isn't a
# fast-forward with respect to V4), and V5 is not updated by the reset to
# V4_child after the second checkpoint (which would be a fast-forward with
# respect to V4) as fast-import immediately exits on a failed checkpoint.
test "$(git rev-parse --verify V5)" = "$(git rev-parse --verify V4)" &&

# fast-import is not running anymore (but do kill it if it is):
! kill -9 $(cat V.pid) &> /dev/null &&
# fast-import exited with an error:
test $(wait $(cat V.pid)) -eq 1



> +test_expect_success PIPE 'V: checkpoint fails if refs updated beforehand' '
> +       git checkout --orphan V3 &&
> +       git commit --allow-empty -m initial &&
> +       INITIAL=$(git rev-parse HEAD) &&
> +
> +       cat >input <<-INPUT_END &&
> +       feature done
> +       commit refs/heads/V3
> +       mark :3
> +       committer Me My <self@and.eye> 1234567890 +0123

You likely want to use:
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE


> +       data 10
> +       generated
> +       from $INITIAL
> +
> +       checkpoint
> +       done
> +       INPUT_END
> +
> +       background_fast_import_race "" input 1 "git commit --allow-empty -m conflicting" &&
> +       background_fast_import_race "--force" input 0 "git commit --allow-empty -m conflicting"
> +'
> +
> +
>  ###
>  ### series W (get-mark and empty orphan commits)
>  ###
> --
> 2.22.0.rc1.dirty
>

  reply	other threads:[~2019-05-27 22:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-24 23:09 [PATCH v4] fast-import: checkpoint: only write out refs and tags if we changed them Eric Rannaud
2019-05-25  2:18 ` Eric Rannaud
2019-05-27 21:12   ` Elijah Newren
2019-05-27 22:45     ` Eric Rannaud [this message]
2019-05-27 23:27       ` Elijah Newren
2019-05-28  0:11         ` Eric Rannaud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH_=xoZ5Lt2nn50cfDbTA-ZVeshi8HZ91Od4btuUXpBGCanGng@mail.gmail.com' \
    --to=e@nanocritical.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeremy.serror@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).