git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Alban Gruin <alban.gruin@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Elijah Newren <newren@gmail.com>, git@vger.kernel.org
Subject: Re: Comparing rebase --am with --interactive via p3400
Date: Wed, 1 Apr 2020 15:00:19 +0100	[thread overview]
Message-ID: <a92f1028-d33b-bd7e-a4d5-f4884faedeed@gmail.com> (raw)
In-Reply-To: <b380123e-dbcd-7e92-67ab-e6cbf0cec061@gmail.com>

Hi Alban and Johannes

On 01/04/2020 12:33, Alban Gruin wrote:
> Hi Johannes,
> 
> Sorry for the late answer, I was really busy for the last months.
> 
> Le 31/01/2020 à 22:23, Johannes Schindelin a écrit :
>> Hi Alban,
>>   -%<-
>>
>> Indeed. I offered these insights in #git-devel (slightly edited):
>>
>> This `discard_index()` is in an awfully central location. I am rather
>> certain that it would cause problems to just remove it.
>>
>> Looking at `do_merge()`: it explicitly discards and re-reads the index if
>> we had to spawn a `git merge` process (which we do if a strategy other
>> than `recursive` was specified, or if it is an octopus merge). But I am
>> wary of other code paths that might not be as careful.
>>
>> I see that `do_exec()` is similarly careful.
>>
> 
> I have to admit that the index is not my area of expertise in git, so
> sorry if my question is stupid: isn't there a less heavy way to find
> unstaged or uncommitted changes than discarding and then reloading the
> index?
> 
>> One thing I cannot fail to notice: we do not re-read a changed index
>> after running the `prepare-commit-msg` hook, or for that matter, any other
>> hook. That could even be an old regression from the conversion of the
>> interactive rebase to using the sequencer rather than a shell script.
>>
>> Further, `reset_merge()` seems to spawn `git reset --merge` without
>> bothering to re-read the possibly modified index. Its callees are
>> `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`,
>> none of which seem to be careful, either, about checking whether the index
>> was modified in the meantime.
>>
>> Technically, the in-memory index should also be discarded
>> in `apply_autostash()`, but so far, we do not have any callers of that
>> function, I don't think, that wants to do anything but release resources
>> and exit.
>>
>> The `run_git_checkout()` function discards, as intended. I
>> am not quite sure whether it needs to, though, unless the `.git/index`
>> file _was_ modified (it _is_ possible, after all, to run `git rebase -i
>> HEAD`, and I do have a use case for that where one of my scripts generates
>> a todo script, sort of a `git cherry-pick --rebase-merges`, because
>> `cherry-pick` does not support that mode).

I'm not sure it is worth optimizing the case where .git/index is not 
changed as we only do this once per rebase. In any case I hope that one 
day we'll stop forking git checkout and use the code from 
builtin/rebase.c to do it

>> The `continue_single_pick()` function spawns a `git
>> commit` which could potentially modify the index through a hook, but the
>> first call site does not care and the second one guards against that
>> (erroring out...).
>>
>> My biggest concern is with the `run_git_commit()` function: it does not
>> re-read a potentially-modified index (think of hooks).

I agree that we should be re-reading the index after forking `git 
commit` and also `git merge`. Most of the time we commit without forking 
so that should not impact the performance too much

> Thank you for your analysis.
> 
>>
>> We will need to be very careful with this `discard_index()`, I think, and
>> in my opinion there is a great opportunity here for cleaning up a little:
>> rather than discarding and re-reading the in-memory index without seeing
>> whether the on-disk index has changed at all appears a bit wasteful to me.
>>
>> This could be refactored into a function that only discards and re-reads
>> the index if the mtime of `.git/index` changed. That function could then
>> also be taught to detect when the in-memory index has unwritten changes:
>> that would constitute a bug.
>>
> 
> Hmm, checking if the mtime of the index to see if it changed isn't racy?
>   Sub-second changes should happen, and to quote a comment in
> is_racy_stat(), "nanosecond timestamped files can also be racy" -- even
> if it should not really happen in the case of rebase…

I don't think relying on the index stat data is a good idea, git 
defaults to one second mtime resolution unless it is built with 
-DUSE_NSEC and we do way more than one commit a second. We tried to rely 
on stat data to determine when to re-read the todo list after an exec 
and it is broken (both in the design because it assumes ns mtime 
resolution and the implementation because we don't update the cached 
mtime after we rewrite the todo list). There are not that many places 
where we need to re-read the index so I think we should just have 
explicit re-reads where we need them. Hopefully over time we'll stop 
forking other processes and the problem will go away.

Best Wishes

Phillip

>> Ciao,
>> Dscho
>>
> 
> Alban
> 

  reply	other threads:[~2020-04-01 14:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-01  6:04 Comparing rebase --am with --interactive via p3400 Johannes Schindelin
2019-02-01  7:22 ` Johannes Schindelin
2019-02-01  9:26 ` Elijah Newren
2019-12-27 21:11 ` Alban Gruin
2019-12-27 22:45   ` Elijah Newren
2019-12-29 17:25     ` Alban Gruin
2020-01-02 20:17       ` Johannes Schindelin
2020-01-31 21:23   ` Johannes Schindelin
2020-04-01 11:33     ` Alban Gruin
2020-04-01 14:00       ` Phillip Wood [this message]
2020-04-04 20:33         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a92f1028-d33b-bd7e-a4d5-f4884faedeed@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alban.gruin@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).