git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: phillip.wood@dunelm.org.uk
Cc: Alban Gruin <alban.gruin@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	git@vger.kernel.org
Subject: Re: Comparing rebase --am with --interactive via p3400
Date: Sat, 4 Apr 2020 22:33:44 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2004042232140.46@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <a92f1028-d33b-bd7e-a4d5-f4884faedeed@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5317 bytes --]

Hi Phillip,

On Wed, 1 Apr 2020, Phillip Wood wrote:

> On 01/04/2020 12:33, Alban Gruin wrote:
> > Hi Johannes,
> >
> > Sorry for the late answer, I was really busy for the last months.
> >
> > Le 31/01/2020 à 22:23, Johannes Schindelin a écrit :
> > > Hi Alban,
> > >   -%<-
> > >
> > > Indeed. I offered these insights in #git-devel (slightly edited):
> > >
> > > This `discard_index()` is in an awfully central location. I am rather
> > > certain that it would cause problems to just remove it.
> > >
> > > Looking at `do_merge()`: it explicitly discards and re-reads the index if
> > > we had to spawn a `git merge` process (which we do if a strategy other
> > > than `recursive` was specified, or if it is an octopus merge). But I am
> > > wary of other code paths that might not be as careful.
> > >
> > > I see that `do_exec()` is similarly careful.
> > >
> >
> > I have to admit that the index is not my area of expertise in git, so
> > sorry if my question is stupid: isn't there a less heavy way to find
> > unstaged or uncommitted changes than discarding and then reloading the
> > index?
> >
> > > One thing I cannot fail to notice: we do not re-read a changed index
> > > after running the `prepare-commit-msg` hook, or for that matter, any other
> > > hook. That could even be an old regression from the conversion of the
> > > interactive rebase to using the sequencer rather than a shell script.
> > >
> > > Further, `reset_merge()` seems to spawn `git reset --merge` without
> > > bothering to re-read the possibly modified index. Its callees are
> > > `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`,
> > > none of which seem to be careful, either, about checking whether the index
> > > was modified in the meantime.
> > >
> > > Technically, the in-memory index should also be discarded
> > > in `apply_autostash()`, but so far, we do not have any callers of that
> > > function, I don't think, that wants to do anything but release resources
> > > and exit.
> > >
> > > The `run_git_checkout()` function discards, as intended. I
> > > am not quite sure whether it needs to, though, unless the `.git/index`
> > > file _was_ modified (it _is_ possible, after all, to run `git rebase -i
> > > HEAD`, and I do have a use case for that where one of my scripts generates
> > > a todo script, sort of a `git cherry-pick --rebase-merges`, because
> > > `cherry-pick` does not support that mode).
>
> I'm not sure it is worth optimizing the case where .git/index is not changed
> as we only do this once per rebase. In any case I hope that one day we'll stop
> forking git checkout and use the code from builtin/rebase.c to do it
>
> > > The `continue_single_pick()` function spawns a `git
> > > commit` which could potentially modify the index through a hook, but the
> > > first call site does not care and the second one guards against that
> > > (erroring out...).
> > >
> > > My biggest concern is with the `run_git_commit()` function: it does not
> > > re-read a potentially-modified index (think of hooks).
>
> I agree that we should be re-reading the index after forking `git commit` and
> also `git merge`. Most of the time we commit without forking so that should
> not impact the performance too much
>
> > Thank you for your analysis.
> >
> > >
> > > We will need to be very careful with this `discard_index()`, I think, and
> > > in my opinion there is a great opportunity here for cleaning up a little:
> > > rather than discarding and re-reading the in-memory index without seeing
> > > whether the on-disk index has changed at all appears a bit wasteful to me.
> > >
> > > This could be refactored into a function that only discards and re-reads
> > > the index if the mtime of `.git/index` changed. That function could then
> > > also be taught to detect when the in-memory index has unwritten changes:
> > > that would constitute a bug.
> > >
> >
> > Hmm, checking if the mtime of the index to see if it changed isn't racy?
> >   Sub-second changes should happen, and to quote a comment in
> > is_racy_stat(), "nanosecond timestamped files can also be racy" -- even
> > if it should not really happen in the case of rebase…
>
> I don't think relying on the index stat data is a good idea, git defaults to
> one second mtime resolution unless it is built with -DUSE_NSEC and we do way
> more than one commit a second. We tried to rely on stat data to determine when
> to re-read the todo list after an exec and it is broken (both in the design
> because it assumes ns mtime resolution and the implementation because we don't
> update the cached mtime after we rewrite the todo list). There are not that
> many places where we need to re-read the index so I think we should just have
> explicit re-reads where we need them. Hopefully over time we'll stop forking
> other processes and the problem will go away.

Well. Even the 1-second granularity should buy us some performance if we
assume that `same mtime` == `racy`. That should still catch the majority
of the cases where the index was simply not changed, at least in the
`do_exec()` case.

Ciao,
Dscho

>
> Best Wishes
>
> Phillip
>
> > > Ciao,
> > > Dscho
> > >
> >
> > Alban
> >
>

      reply	other threads:[~2020-04-04 20:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-01  6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin
2019-02-01  7:22 ` Johannes Schindelin
2019-02-01  9:26 ` Elijah Newren
2019-12-27 21:11 ` Alban Gruin
2019-12-27 22:45   ` Elijah Newren
2019-12-29 17:25     ` Alban Gruin
2020-01-02 20:17       ` Johannes Schindelin
2020-01-31 21:23   ` Johannes Schindelin
2020-04-01 11:33     ` Alban Gruin
2020-04-01 14:00       ` Phillip Wood
2020-04-04 20:33         ` Johannes Schindelin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2004042232140.46@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=alban.gruin@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).