git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] rebase, cherry-pick, revert: only run from toplevel
@ 2021-08-31  3:03 Elijah Newren via GitGitGadget
  2021-08-31  3:05 ` Elijah Newren
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-08-31  3:03 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Allowing rebase, cherry-pick and revert to run from subdirectories
inevitably leads to eventual user confusion.  For example, if they
are within a directory that was created by one of the patches being
rebased, then the rebase operation could hit a conflict before the
directory is restored leading the user to be running from a directory
that no longer exists.  Similarly with cherry-pick and revert, those
operations could result in the directory being removed.

Similar to bisect, simply require that these commands be run from the
toplevel to avoid such problems.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    rebase, cherry-pick, revert: only run from toplevel
    
    Allowing rebase, cherry-pick and revert to run from subdirectories
    inevitably leads to eventual user confusion. For example, if they are
    within a directory that was created by one of the patches being rebased,
    then the rebase operation could hit a conflict before the directory is
    restored leading the user to be running from a directory that no longer
    exists. Similarly with cherry-pick and revert, those operations could
    result in the directory being removed.
    
    Similar to bisect, simply require that these commands be run from the
    toplevel to avoid such problems.
    
    Signed-off-by: Elijah Newren newren@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1083%2Fnewren%2Ftoplevel-sequencing-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1083/newren/toplevel-sequencing-v1
Pull-Request: https://github.com/git/git/pull/1083

 builtin/rebase.c              |  3 +++
 builtin/revert.c              |  6 ++++++
 t/t3404-rebase-interactive.sh | 22 ----------------------
 3 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index c284a7ace19..7100f0627f6 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1430,6 +1430,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_rebase_usage,
 				   builtin_rebase_options);
 
+	if (prefix)
+		die(_("You need to run this command from the toplevel of the working tree."));
+
 	options.allow_empty_message = 1;
 	git_config(rebase_config, &options);
 	/* options.gpg_sign_opt will be either "-S" or NULL */
diff --git a/builtin/revert.c b/builtin/revert.c
index 237f2f18d4c..9a150dcbdaf 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -230,6 +230,9 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int res;
 
+	if (prefix)
+		die(_("You need to run this command from the toplevel of the working tree."));
+
 	opts.action = REPLAY_REVERT;
 	sequencer_init_config(&opts);
 	res = run_sequencer(argc, argv, &opts);
@@ -243,6 +246,9 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int res;
 
+	if (prefix)
+		die(_("You need to run this command from the toplevel of the working tree."));
+
 	opts.action = REPLAY_PICK;
 	sequencer_init_config(&opts);
 	res = run_sequencer(argc, argv, &opts);
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 66bcbbf9528..dd1afb97fca 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -112,28 +112,6 @@ test_expect_success 'rebase -i with the exec command' '
 	rm -f touch-*
 '
 
-test_expect_success 'rebase -i with the exec command runs from tree root' '
-	git checkout primary &&
-	mkdir subdir && (cd subdir &&
-	set_fake_editor &&
-	FAKE_LINES="1 exec_>touch-subdir" \
-		git rebase -i HEAD^
-	) &&
-	test_path_is_file touch-subdir &&
-	rm -fr subdir
-'
-
-test_expect_success 'rebase -i with exec allows git commands in subdirs' '
-	test_when_finished "rm -rf subdir" &&
-	test_when_finished "git rebase --abort ||:" &&
-	git checkout primary &&
-	mkdir subdir && (cd subdir &&
-	set_fake_editor &&
-	FAKE_LINES="1 x_cd_subdir_&&_git_rev-parse_--is-inside-work-tree" \
-		git rebase -i HEAD^
-	)
-'
-
 test_expect_success 'rebase -i sets work tree properly' '
 	test_when_finished "rm -rf subdir" &&
 	test_when_finished "test_might_fail git rebase --abort" &&

base-commit: 6c40894d2466d4e7fddc047a05116aa9d14712ee
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-08-31  3:03 [PATCH] rebase, cherry-pick, revert: only run from toplevel Elijah Newren via GitGitGadget
@ 2021-08-31  3:05 ` Elijah Newren
  2021-08-31  5:55 ` Johannes Sixt
  2021-08-31  7:01 ` Jeff King
  2 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2021-08-31  3:05 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Git Mailing List

On Mon, Aug 30, 2021 at 8:03 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> Allowing rebase, cherry-pick and revert to run from subdirectories
> inevitably leads to eventual user confusion.  For example, if they
> are within a directory that was created by one of the patches being
> rebased, then the rebase operation could hit a conflict before the
> directory is restored leading the user to be running from a directory
> that no longer exists.  Similarly with cherry-pick and revert, those
> operations could result in the directory being removed.
>
> Similar to bisect, simply require that these commands be run from the
> toplevel to avoid such problems.

See also <xmqqv93n7q1v.fsf@gitster.g> and the rest of the thread.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-08-31  3:03 [PATCH] rebase, cherry-pick, revert: only run from toplevel Elijah Newren via GitGitGadget
  2021-08-31  3:05 ` Elijah Newren
@ 2021-08-31  5:55 ` Johannes Sixt
  2021-08-31  7:01 ` Jeff King
  2 siblings, 0 replies; 13+ messages in thread
From: Johannes Sixt @ 2021-08-31  5:55 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Elijah Newren via GitGitGadget

Am 31.08.21 um 05:03 schrieb Elijah Newren via GitGitGadget:
> From: Elijah Newren <newren@gmail.com>
> 
> Allowing rebase, cherry-pick and revert to run from subdirectories
> inevitably leads to eventual user confusion.  For example, if they
> are within a directory that was created by one of the patches being
> rebased, then the rebase operation could hit a conflict before the
> directory is restored leading the user to be running from a directory
> that no longer exists.  Similarly with cherry-pick and revert, those
> operations could result in the directory being removed.
> 
> Similar to bisect, simply require that these commands be run from the
> toplevel to avoid such problems.

I am not a friend of this change. I understand the motivation behind it.
But most of the time, rebase and cherry-pick are operated on own code,
where directories do not disappear and appear at random, and this new
enforced condition becomes awkward.

One of my use-cases is that I operate git-rebase from an untracked build
directory inside the repository. Having to pass -C .. all the time
strikes the wrong balance, IMO.

Things are slightly different for git-bisect (at least for me), because
oftentimes it is operated on foreign code, where you may not know which
directories come and go. But even that is a weak argument to force the
command to the top-level of the repository.

Perhaps it is sufficient to force git-pull to the top-level (if it isn't
already).

-- Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-08-31  3:03 [PATCH] rebase, cherry-pick, revert: only run from toplevel Elijah Newren via GitGitGadget
  2021-08-31  3:05 ` Elijah Newren
  2021-08-31  5:55 ` Johannes Sixt
@ 2021-08-31  7:01 ` Jeff King
  2021-08-31 20:14   ` Elijah Newren
  2 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2021-08-31  7:01 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Tue, Aug 31, 2021 at 03:03:50AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> Allowing rebase, cherry-pick and revert to run from subdirectories
> inevitably leads to eventual user confusion.  For example, if they
> are within a directory that was created by one of the patches being
> rebased, then the rebase operation could hit a conflict before the
> directory is restored leading the user to be running from a directory
> that no longer exists.  Similarly with cherry-pick and revert, those
> operations could result in the directory being removed.
> 
> Similar to bisect, simply require that these commands be run from the
> toplevel to avoid such problems.

IMHO this is too draconian. You are occasionally helping people who are
in a directory which goes away over the course of the operation. But you
are hurting everyone who _isn't_ in that situation, and who needlessly
has to re-issue their command after doing a "cd".

I think we'd be much better served to do even a rudimentary analysis of
whether the operation will be a problem. E.g., if we taught the checkout
code to error out when the cwd is going to disappear, then:

  - we'd protect the user from confusion during regular sight-seeing via
    "git checkout v0.99" and so forth

  - we'd protect the most common cases for git-rebase (your patches
    introduce "subdir/", but it is not yet in the parent directory). We
    wouldn't preemptively avoid a rebase where subdir/ disappears and
    then reappears in the middle of the series. We could find such a
    case by iterating over the patches, but IMHO it's not worth the
    computation.

  - we could likewise protect git-bisect, making it more reasonable to
    loosen its current restriction

  - we might want to teach similar logic to sequencer operations, so
    that applying a patch would likewise error-out. That would protect
    cherry-pick and revert, but also make the "subdir/ disappears
    mid-patch-series" case pretty nice: the specific patch that deletes
    it would fail to apply, and then you could "cd .. && git rebase
    --continue".

    I suspect that the "oops, we're going to delete cwd" code would end
    up in unpack-trees anyway, which means that both checkout and all of
    this sequencer operations would use the same code.

Now I have spent zero time looking into actually coding this, so it may
turn out to be much trickier than I am suggesting. But this seems like a
much more fruitful direction, where we can protect users in cases where
they benefit (and give them sensible and actionable error messages),
without bothering people in the majority of cases where their cwd
doesn't go away.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-08-31  7:01 ` Jeff King
@ 2021-08-31 20:14   ` Elijah Newren
  2021-09-01  2:55     ` Taylor Blau
  2021-09-01  5:29     ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Elijah Newren @ 2021-08-31 20:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Aug 31, 2021 at 12:01 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Aug 31, 2021 at 03:03:50AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Allowing rebase, cherry-pick and revert to run from subdirectories
> > inevitably leads to eventual user confusion.  For example, if they
> > are within a directory that was created by one of the patches being
> > rebased, then the rebase operation could hit a conflict before the
> > directory is restored leading the user to be running from a directory
> > that no longer exists.  Similarly with cherry-pick and revert, those
> > operations could result in the directory being removed.
> >
> > Similar to bisect, simply require that these commands be run from the
> > toplevel to avoid such problems.
>
> IMHO this is too draconian. You are occasionally helping people who are
> in a directory which goes away over the course of the operation. But you
> are hurting everyone who _isn't_ in that situation, and who needlessly
> has to re-issue their command after doing a "cd".
>
> I think we'd be much better served to do even a rudimentary analysis of
> whether the operation will be a problem. E.g., if we taught the checkout
> code to error out when the cwd is going to disappear, then:
>
>   - we'd protect the user from confusion during regular sight-seeing via
>     "git checkout v0.99" and so forth
>
>   - we'd protect the most common cases for git-rebase (your patches
>     introduce "subdir/", but it is not yet in the parent directory). We
>     wouldn't preemptively avoid a rebase where subdir/ disappears and
>     then reappears in the middle of the series. We could find such a
>     case by iterating over the patches, but IMHO it's not worth the
>     computation.
>
>   - we could likewise protect git-bisect, making it more reasonable to
>     loosen its current restriction
>
>   - we might want to teach similar logic to sequencer operations, so
>     that applying a patch would likewise error-out. That would protect
>     cherry-pick and revert, but also make the "subdir/ disappears
>     mid-patch-series" case pretty nice: the specific patch that deletes
>     it would fail to apply, and then you could "cd .. && git rebase
>     --continue".
>
>     I suspect that the "oops, we're going to delete cwd" code would end
>     up in unpack-trees anyway, which means that both checkout and all of
>     this sequencer operations would use the same code.

Well, sequencer uses the merge machinery, and merge-ort creates a
toplevel merge tree that includes conflict markers in files and
whatnot, and then only updates the working tree via checkout().  So,
yeah, it really should be the same codepath.

> Now I have spent zero time looking into actually coding this, so it may
> turn out to be much trickier than I am suggesting. But this seems like a
> much more fruitful direction, where we can protect users in cases where
> they benefit (and give them sensible and actionable error messages),
> without bothering people in the majority of cases where their cwd
> doesn't go away.

Ooh, this sounds intriguing to me...but what if we changed that rule
slightly and just decided to never make the cwd go away?  Currently,
the checkout code removes directories if they have no tracked or
untracked or ignored files left, i.e. if they're empty.  What if we
decide to only have remove_scheduled_dirs() remove directories that
are empty AND they are not the current working directory?

Granted, that could leave a directory in the way of files from other
revisions, but the same was already true in the analogous situation
where there were leftover untracked files in some directory.  Since
all our code paths already have to handle directories leftover due to
untracked files, perhaps avoiding removing the current working
directory would just automatically work for us?  (I think it'd handle
all of checkout, bisect, rebase, cherry-pick, revert, and merge.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-08-31 20:14   ` Elijah Newren
@ 2021-09-01  2:55     ` Taylor Blau
  2021-09-01  4:43       ` Elijah Newren
  2021-09-01  5:29     ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Taylor Blau @ 2021-09-01  2:55 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Aug 31, 2021 at 01:14:55PM -0700, Elijah Newren wrote:
> > Now I have spent zero time looking into actually coding this, so it may
> > turn out to be much trickier than I am suggesting. But this seems like a
> > much more fruitful direction, where we can protect users in cases where
> > they benefit (and give them sensible and actionable error messages),
> > without bothering people in the majority of cases where their cwd
> > doesn't go away.
>
> Ooh, this sounds intriguing to me...but what if we changed that rule
> slightly and just decided to never make the cwd go away?  Currently,
> the checkout code removes directories if they have no tracked or
> untracked or ignored files left, i.e. if they're empty.  What if we
> decide to only have remove_scheduled_dirs() remove directories that
> are empty AND they are not the current working directory?

Hmm. My first thought after reading this is that it would cause
surprising behavior for anybody who had 'git add --all' in their 'rebase
-x' script. But would it?

I.e., imagine somebody doing an in-place sed in a rebase and then `git
add --all`-ing the result at each point in history. If the directory
they were in ever went away, then the *next* revision would add that
directory right back.

That behavior seems somewhat surprising to me, or at least I could
imagine it being surprising to users.

Another thought is what should happen when the current directory goes
away and then comes back as a file? We wouldn't be able to checkout that
file, I don't think, so it might be a dead end.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-09-01  2:55     ` Taylor Blau
@ 2021-09-01  4:43       ` Elijah Newren
  2021-09-01  4:59         ` Taylor Blau
  0 siblings, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2021-09-01  4:43 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Aug 31, 2021 at 7:55 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Tue, Aug 31, 2021 at 01:14:55PM -0700, Elijah Newren wrote:
> > > Now I have spent zero time looking into actually coding this, so it may
> > > turn out to be much trickier than I am suggesting. But this seems like a
> > > much more fruitful direction, where we can protect users in cases where
> > > they benefit (and give them sensible and actionable error messages),
> > > without bothering people in the majority of cases where their cwd
> > > doesn't go away.
> >
> > Ooh, this sounds intriguing to me...but what if we changed that rule
> > slightly and just decided to never make the cwd go away?  Currently,
> > the checkout code removes directories if they have no tracked or
> > untracked or ignored files left, i.e. if they're empty.  What if we
> > decide to only have remove_scheduled_dirs() remove directories that
> > are empty AND they are not the current working directory?
>
> Hmm. My first thought after reading this is that it would cause
> surprising behavior for anybody who had 'git add --all' in their 'rebase
> -x' script. But would it?
>
> I.e., imagine somebody doing an in-place sed in a rebase and then `git
> add --all`-ing the result at each point in history. If the directory
> they were in ever went away, then the *next* revision would add that
> directory right back.
>
> That behavior seems somewhat surprising to me, or at least I could
> imagine it being surprising to users.

I'm not following.  `git add --all` doesn't add empty directories, so
I don't see how my proposed change would cause any problems in such a
case; nothing would be added back.

> Another thought is what should happen when the current directory goes
> away and then comes back as a file? We wouldn't be able to checkout that
> file, I don't think, so it might be a dead end.

I'm not following this either.  Peff's original suggestion was to
error out only when we knew it could cause problems, in particular
when the working directory would be removed.  Here I've shifted the
way the problem is viewed by just not removing the working directory,
but the end result is the same -- it errors out when the removal was
needed.  Given that erroring out is exactly what we wanted for a case
like this, why does that make it a dead end?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-09-01  4:43       ` Elijah Newren
@ 2021-09-01  4:59         ` Taylor Blau
  2021-09-01  6:48           ` Elijah Newren
  0 siblings, 1 reply; 13+ messages in thread
From: Taylor Blau @ 2021-09-01  4:59 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Taylor Blau, Jeff King, Elijah Newren via GitGitGadget,
	Git Mailing List

On Tue, Aug 31, 2021 at 09:43:15PM -0700, Elijah Newren wrote:
> On Tue, Aug 31, 2021 at 7:55 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > On Tue, Aug 31, 2021 at 01:14:55PM -0700, Elijah Newren wrote:
> > > > Now I have spent zero time looking into actually coding this, so it may
> > > > turn out to be much trickier than I am suggesting. But this seems like a
> > > > much more fruitful direction, where we can protect users in cases where
> > > > they benefit (and give them sensible and actionable error messages),
> > > > without bothering people in the majority of cases where their cwd
> > > > doesn't go away.
> > >
> > > Ooh, this sounds intriguing to me...but what if we changed that rule
> > > slightly and just decided to never make the cwd go away?  Currently,
> > > the checkout code removes directories if they have no tracked or
> > > untracked or ignored files left, i.e. if they're empty.  What if we
> > > decide to only have remove_scheduled_dirs() remove directories that
> > > are empty AND they are not the current working directory?
> >
> > Hmm. My first thought after reading this is that it would cause
> > surprising behavior for anybody who had 'git add --all' in their 'rebase
> > -x' script. But would it?
> >
> > I.e., imagine somebody doing an in-place sed in a rebase and then `git
> > add --all`-ing the result at each point in history. If the directory
> > they were in ever went away, then the *next* revision would add that
> > directory right back.
> >
> > That behavior seems somewhat surprising to me, or at least I could
> > imagine it being surprising to users.
>
> I'm not following.  `git add --all` doesn't add empty directories, so
> I don't see how my proposed change would cause any problems in such a
> case; nothing would be added back.

Ahh, it was I who wasn't following. You were proposing to leave the
directories in place but empty. Agreed that there wouldn't be any
problems with that.

> > Another thought is what should happen when the current directory goes
> > away and then comes back as a file? We wouldn't be able to checkout that
> > file, I don't think, so it might be a dead end.
>
> I'm not following this either.  Peff's original suggestion was to
> error out only when we knew it could cause problems, in particular
> when the working directory would be removed.  Here I've shifted the
> way the problem is viewed by just not removing the working directory,
> but the end result is the same -- it errors out when the removal was
> needed.  Given that erroring out is exactly what we wanted for a case
> like this, why does that make it a dead end?

The way you shifted the problem makes it possible for us to discover
that only right before we're about to fail, right? In other words, if
you're doing a rebase then you're potentially leaving a lot of wasted
work on the table if you realize halfway through the operation that you
couldn't complete it.

Even though I think that's *not* what you want for rebase, it ironically
*is* what you might want for bisect, since the path we'll take isn't
known ahead of time. So even if there are some paths that would result
in a directory -> file conversion in the cwd, we don't need to fail the
operation ahead of time if the bisection doesn't actually take that
path.

On the other hand, it does still leave a lot of work on the table if the
bisection does eventually want to change the current working directory
into a file, depending on where that change happens.

Maybe those cases are pretty niche in practice. I have to imagine that
they are. If my guess is right, then I think your approach makes sense.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-08-31 20:14   ` Elijah Newren
  2021-09-01  2:55     ` Taylor Blau
@ 2021-09-01  5:29     ` Junio C Hamano
  2021-09-01  6:08       ` Elijah Newren
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2021-09-01  5:29 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Ooh, this sounds intriguing to me...but what if we changed that rule
> slightly and just decided to never make the cwd go away?  Currently,
> the checkout code removes directories if they have no tracked or
> untracked or ignored files left, i.e. if they're empty.  What if we
> decide to only have remove_scheduled_dirs() remove directories that
> are empty AND they are not the current working directory?

Is that generally doable?  What would we do when the directory the
subcommand was started from (or one of its parent directories) is
not just missing but has to be a file in the revision the subcommand
is trying to checkout?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-09-01  5:29     ` Junio C Hamano
@ 2021-09-01  6:08       ` Elijah Newren
  2021-09-01  6:30         ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2021-09-01  6:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Aug 31, 2021 at 10:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Ooh, this sounds intriguing to me...but what if we changed that rule
> > slightly and just decided to never make the cwd go away?  Currently,
> > the checkout code removes directories if they have no tracked or
> > untracked or ignored files left, i.e. if they're empty.  What if we
> > decide to only have remove_scheduled_dirs() remove directories that
> > are empty AND they are not the current working directory?
>
> Is that generally doable?  What would we do when the directory the
> subcommand was started from (or one of its parent directories) is
> not just missing but has to be a file in the revision the subcommand
> is trying to checkout?

The same problem (an untracked directory is in the way) already exists
and has to be handled by all relevant subcommands, right?

In particular, if the current working directory only has untracked
files in it, then the directory cannot be removed.  That will prevent
us from checking out the revision we want, so we have to throw an
error.

So my idea just piggy backs on that, resulting in the same error also
being shown when the current working directory has 0 untracked files
within it.

Since the whole thread started from, "maybe we should throw an error
instead of continuing if it would result in the current working
directory getting deleted", I believe this idea does exactly what we
were looking for...and nicely tailors the new error cases to precisely
the situations we wanted them for -- when the current working
directory would have been removed by the old code.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-09-01  6:08       ` Elijah Newren
@ 2021-09-01  6:30         ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2021-09-01  6:30 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Aug 31, 2021 at 11:08:35PM -0700, Elijah Newren wrote:

> On Tue, Aug 31, 2021 at 10:29 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > Ooh, this sounds intriguing to me...but what if we changed that rule
> > > slightly and just decided to never make the cwd go away?  Currently,
> > > the checkout code removes directories if they have no tracked or
> > > untracked or ignored files left, i.e. if they're empty.  What if we
> > > decide to only have remove_scheduled_dirs() remove directories that
> > > are empty AND they are not the current working directory?
> >
> > Is that generally doable?  What would we do when the directory the
> > subcommand was started from (or one of its parent directories) is
> > not just missing but has to be a file in the revision the subcommand
> > is trying to checkout?
> 
> The same problem (an untracked directory is in the way) already exists
> and has to be handled by all relevant subcommands, right?
> 
> In particular, if the current working directory only has untracked
> files in it, then the directory cannot be removed.  That will prevent
> us from checking out the revision we want, so we have to throw an
> error.
> 
> So my idea just piggy backs on that, resulting in the same error also
> being shown when the current working directory has 0 untracked files
> within it.
> 
> Since the whole thread started from, "maybe we should throw an error
> instead of continuing if it would result in the current working
> directory getting deleted", I believe this idea does exactly what we
> were looking for...and nicely tailors the new error cases to precisely
> the situations we wanted them for -- when the current working
> directory would have been removed by the old code.

Yeah, this is definitely what I had in mind for my original suggestion:
extending the "do not delete directory if it contains an untracked file"
rule to the cwd. I guess I was thinking of it as an error, but it really
_isn't_ an error in most cases, as long as we don't need to recreate
another entry there (and if we do, then as you say, the existing error
paths should handle it).

It does leave crufty directories behind in the working tree that Git
would not then delete (because it only cleans up directories after it
removes their contents, and there are no contents left to remove).
That's probably not a big deal, though. If you are moving around history
with things like rebase or bisect, you'd likely move back to a working
tree with the directory anyway. And it's not like you can't end up with
such directories through other means (and "git clean -d" takes care of
them).

So I do like the approach.

-Peff

PS I'm assuming this is all taking the cwd of the process based on the
   original "prefix" (since Git processes generally move up to the
   top-level of the working tree). But that still leaves the problem for
   _other_ processes. E.g., if I am working in two terminals, one could
   be in "subdir/" and I could issue a bisect or rebase command from the
   other at the top-level, which would put one of them into the
   confusing state. You could strengthen the rule to "is _any_ process
   using this as a cwd", but it's hard to do so portably. And I know
   I've been annoyed by similar such safety valves before ("why can't I
   delete this? Argh, some process I don't care about is squatting on
   it").

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
  2021-09-01  4:59         ` Taylor Blau
@ 2021-09-01  6:48           ` Elijah Newren
  0 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2021-09-01  6:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Aug 31, 2021 at 9:59 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Tue, Aug 31, 2021 at 09:43:15PM -0700, Elijah Newren wrote:
> > On Tue, Aug 31, 2021 at 7:55 PM Taylor Blau <me@ttaylorr.com> wrote:
> > >
> > > On Tue, Aug 31, 2021 at 01:14:55PM -0700, Elijah Newren wrote:
> > > > > Now I have spent zero time looking into actually coding this, so it may
> > > > > turn out to be much trickier than I am suggesting. But this seems like a
> > > > > much more fruitful direction, where we can protect users in cases where
> > > > > they benefit (and give them sensible and actionable error messages),
> > > > > without bothering people in the majority of cases where their cwd
> > > > > doesn't go away.
> > > >
> > > > Ooh, this sounds intriguing to me...but what if we changed that rule
> > > > slightly and just decided to never make the cwd go away?  Currently,
> > > > the checkout code removes directories if they have no tracked or
> > > > untracked or ignored files left, i.e. if they're empty.  What if we
> > > > decide to only have remove_scheduled_dirs() remove directories that
> > > > are empty AND they are not the current working directory?
> > >
> > > Hmm. My first thought after reading this is that it would cause
> > > surprising behavior for anybody who had 'git add --all' in their 'rebase
> > > -x' script. But would it?
> > >
> > > I.e., imagine somebody doing an in-place sed in a rebase and then `git
> > > add --all`-ing the result at each point in history. If the directory
> > > they were in ever went away, then the *next* revision would add that
> > > directory right back.
> > >
> > > That behavior seems somewhat surprising to me, or at least I could
> > > imagine it being surprising to users.
> >
> > I'm not following.  `git add --all` doesn't add empty directories, so
> > I don't see how my proposed change would cause any problems in such a
> > case; nothing would be added back.
>
> Ahh, it was I who wasn't following. You were proposing to leave the
> directories in place but empty. Agreed that there wouldn't be any
> problems with that.
>
> > > Another thought is what should happen when the current directory goes
> > > away and then comes back as a file? We wouldn't be able to checkout that
> > > file, I don't think, so it might be a dead end.
> >
> > I'm not following this either.  Peff's original suggestion was to
> > error out only when we knew it could cause problems, in particular
> > when the working directory would be removed.  Here I've shifted the
> > way the problem is viewed by just not removing the working directory,
> > but the end result is the same -- it errors out when the removal was
> > needed.  Given that erroring out is exactly what we wanted for a case
> > like this, why does that make it a dead end?
>
> The way you shifted the problem makes it possible for us to discover
> that only right before we're about to fail, right? In other words, if
> you're doing a rebase then you're potentially leaving a lot of wasted
> work on the table if you realize halfway through the operation that you
> couldn't complete it.

Sure, it shifts things towards failing later.  However, for
correctness reasons, merge-ort has already fundamentally shifted
everything working-directory related towards failing later.

merge-recursive (the old algorithm), via unpack-trees, tried to do
checks for "locally modified files in the way" and "untracked
files/directories in the way" early on in the process.  However, such
checks were unaware of rename detection logic and thus had both false
positives and false negatives in terms of throwing errors and saying
the merge could not proceed.  Those false positives and negatives were
intrinsic to the design of merge-recursive; they'll never be fixed.
The only correct fix is to fundamentally change the design -- do the
entire merge operation without touching the working tree or index, and
only after you have a new resulting merge tree, then do a checkout
operation to switch from the old tree to the new merge result tree;
only at that final checkout step do you fail if there are locally
modified files or untracked files/directories in the way.  That's what
merge-ort does.

merge-ort is also written to allow a whole rebase to be done
"in-memory" and only update the working tree and index after we have
the final end-tree.  test-tool fast-rebase already works this way.  In
such a case, we'd only notice an untracked directory being in the way
at the point we hit a conflict or when we got to the end of the
rebase.  So again, erroring out when there is a directory with >= 1
untracked files (and at the end of the operation) isn't a new issue at
all.  The only thing that is new is we'd also error out if a directory
had 0 untracked files in it.

However, I disagree with the "a lot of wasted work"; rebase is
hundreds or thousands of times faster than it used to be as you
highlighted in your blog post.  ;-)  Also, as per my --remerge-diff
thread, merge-ort can remerge 33.5 merge commits/second in linux.git.
And that's just merges; rebases are faster than merges because (a) the
intersection (rather than union) of changes on both sides of history
tends to be much smaller for rebases than merges, due to the fact that
individual commits tend to only modify a handful of files (and
merge-ort optimizations work very well when the intersection of files
changed on both sides is small), and (b) we can cache rename detection
results for the upstream side of history in rebases.

> Even though I think that's *not* what you want for rebase, it ironically

I think it's totally what we want for rebase; it's perfectly aligned
with the merge-ort correctness changes to move things to
fail-at-the-end for working-directory-related problems.

> *is* what you might want for bisect, since the path we'll take isn't
> known ahead of time. So even if there are some paths that would result
> in a directory -> file conversion in the cwd, we don't need to fail the
> operation ahead of time if the bisection doesn't actually take that
> path.
>
> On the other hand, it does still leave a lot of work on the table if the
> bisection does eventually want to change the current working directory
> into a file, depending on where that change happens.
>
> Maybe those cases are pretty niche in practice. I have to imagine that
> they are. If my guess is right, then I think your approach makes sense.

:-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] rebase, cherry-pick, revert: only run from toplevel
@ 2021-11-26  7:31 Leon Dingman
  0 siblings, 0 replies; 13+ messages in thread
From: Leon Dingman @ 2021-11-26  7:31 UTC (permalink / raw)
  To: peff; +Cc: git, gitgitgadget, gitster, newren

Fu

Done

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-26  7:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31  3:03 [PATCH] rebase, cherry-pick, revert: only run from toplevel Elijah Newren via GitGitGadget
2021-08-31  3:05 ` Elijah Newren
2021-08-31  5:55 ` Johannes Sixt
2021-08-31  7:01 ` Jeff King
2021-08-31 20:14   ` Elijah Newren
2021-09-01  2:55     ` Taylor Blau
2021-09-01  4:43       ` Elijah Newren
2021-09-01  4:59         ` Taylor Blau
2021-09-01  6:48           ` Elijah Newren
2021-09-01  5:29     ` Junio C Hamano
2021-09-01  6:08       ` Elijah Newren
2021-09-01  6:30         ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2021-11-26  7:31 Leon Dingman

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).