git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Re: BUG: git clean -d cannot remove files from read-only directories
@ 2020-02-20 18:55 Adam Milazzo
  2020-02-20 19:45 ` Elijah Newren
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Milazzo @ 2020-02-20 18:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

> It is how UNIX-like filesystem works, isn't it?

Sure, but it doesn't have to be how "git clean -d" works. In particular, "git clean -ffd" could be more forceful, or there could be another option or a third level of force beyond the current two levels.

It really comes back to this question: "How can I avoid the failure, given that I am running 'git clean' from a script and not interactively?" The only answer I could find that's not unreasonable is to parse the text output of 'git status -s' to find the untracked directories and then run 'chmod -R u+w' on each of those directories before running 'git clean -d'." I can do that, but I still think a "force" (or other) flag that really forces the cleanup would be preferable, especially given that this isn't a completely idiosyncratic scenario but one that will happen more and more as go modules are adopted (unless go is changed to stop putting them in read-only directories).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 18:55 BUG: git clean -d cannot remove files from read-only directories Adam Milazzo
@ 2020-02-20 19:45 ` Elijah Newren
  2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
  0 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2020-02-20 19:45 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

On Thu, Feb 20, 2020 at 10:58 AM Adam Milazzo
<Adam.Milazzo@microsoft.com> wrote:
>
> > It is how UNIX-like filesystem works, isn't it?
>
> Sure, but it doesn't have to be how "git clean -d" works. In particular, "git clean -ffd" could be more forceful, or there could be another option or a third level of force beyond the current two levels.
>
> It really comes back to this question: "How can I avoid the failure, given that I am running 'git clean' from a script and not interactively?" The only answer I could find that's not unreasonable is to parse the text output of 'git status -s' to find the untracked directories and then run 'chmod -R u+w' on each of those directories before running 'git clean -d'." I can do that, but I still think a "force" (or other) flag that really forces the cleanup would be preferable, especially given that this isn't a completely idiosyncratic scenario but one that will happen more and more as go modules are adopted (unless go is changed to stop putting them in read-only directories).


Tying permission override to e.g -ffd would be a really bad idea.
People would start adopting that for the permission override reason,
then someone is going to accidentally nuke a git submodule with
unpushed changes.

Tying it to -fd doesn't have that particular problem, but I'm worried
a bit about a slippery slope if we do it.  If we say that "git clean
-fd" should check and modify directory permission bits, should it also
check and modify any necessary ACLs?  What if the user in question
doesn't have perms, but the user in question is in sudoers -- should
git try to see if they can run chown or chattr under sudo too?

Also, another way to look at this; currently 'git clean -fd' behaves
the same on untracked directories as 'rm -rf' does and I think that's
a good model for how to behave.  Why should they be different?

$ git clean -fd unwritable-dir/
warning: failed to remove unwritable-dir/a: Permission denied
warning: failed to remove unwritable-dir/b: Permission denied
warning: failed to remove unwritable-dir/c: Permission denied
$ rm -rf unwritable-dir/
rm: cannot remove 'unwritable-dir/a': Permission denied
rm: cannot remove 'unwritable-dir/b': Permission denied
rm: cannot remove 'unwritable-dir/c': Permission denied

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 19:45 ` Elijah Newren
@ 2020-02-20 20:05   ` Adam Milazzo
  2020-02-20 22:26     ` Junio C Hamano
  2020-02-20 23:13     ` Elijah Newren
  0 siblings, 2 replies; 10+ messages in thread
From: Adam Milazzo @ 2020-02-20 20:05 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, git@vger.kernel.org

> Tying permission override to e.g -ffd would be a really bad idea. People would start adopting that for the permission override reason, then someone is going to accidentally nuke a git submodule with unpushed changes.

Fair enough. I'm not arguing that it must necessarily be any kind of overloading of -f. It could just as well be a separate option. (Personally I'd say overloading -f to make "-f -f" mean "also delete untracked git repositories" was the bad idea, since that is not the obvious and intuitive meaning of "--force", whereas "try harder to delete files you would normally delete" _is_ the intuitive meaning, but the history can't be changed.)

> Also, another way to look at this; currently 'git clean -fd' behaves the same on untracked directories as 'rm -rf' does and I think that's a good model for how to behave.  Why should they be different?

I would ask "why should they be the same?", because there's no obvious reason why behavior of "git clean", whose purpose is to put a repository back to a pristine state, should always behave the same way as "rm", whose purpose is to delete a specified set of files, and more to the point there is no reason why they should have no option to behave differently. I am not arguing for the default behavior of "git clean" to change, since that could be seen as a breaking change. Also, they are already different. "git clean" will remove a read-only file. "rm" will not (without confirmation), although "rm -f" will.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
@ 2020-02-20 22:26     ` Junio C Hamano
  2020-02-20 23:52       ` Adam Milazzo
  2020-02-20 23:13     ` Elijah Newren
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2020-02-20 22:26 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Elijah Newren, git@vger.kernel.org

Adam Milazzo <Adam.Milazzo@microsoft.com> writes:

>> Also, another way to look at this; currently 'git clean -fd'
>> behaves the same on untracked directories as 'rm -rf' does and I
>> think that's a good model for how to behave.  Why should they be
>> different?
>
> I would ask "why should they be the same?"

Simply because that is how users would expect how the world works
(iow, model things after what they are already familiar with).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
  2020-02-20 22:26     ` Junio C Hamano
@ 2020-02-20 23:13     ` Elijah Newren
  2020-02-21  1:48       ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2020-02-20 23:13 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

On Thu, Feb 20, 2020 at 12:05 PM Adam Milazzo
<Adam.Milazzo@microsoft.com> wrote:
>
> > Tying permission override to e.g -ffd would be a really bad idea. People would start adopting that for the permission override reason, then someone is going to accidentally nuke a git submodule with unpushed changes.
>
> Fair enough. I'm not arguing that it must necessarily be any kind of overloading of -f. It could just as well be a separate option. (Personally I'd say overloading -f to make "-f -f" mean "also delete untracked git repositories" was the bad idea, since that is not the obvious and intuitive meaning of "--force", whereas "try harder to delete files you would normally delete" _is_ the intuitive meaning, but the history can't be changed.)

Yeah, arguing to change existing behavior (as you started with by
labelling the thread with "BUG") will probably get you a lot of
pushback, but let me address your new angle of suggesting a new
option...

> > Also, another way to look at this; currently 'git clean -fd' behaves the same on untracked directories as 'rm -rf' does and I think that's a good model for how to behave.  Why should they be different?
>
> I would ask "why should they be the same?",

I think it matches user expectation of how the filesystem works.  But
if we did want to implement a new flag, there's a question of what it
should be and what it should do.  Your "put the repository back to a
pristine state" isn't enough to really go on from an implementation.
Clearly you don't mean that it should also do a 'reset --hard' --
despite the fact that running a reset fits this definition, but even
in terms of untracked files you've still left out the "slippery slope"
angle from my previous response in what you've addressed here.  Let's
say we did implement a new flag that said we should override standard
permissions.  In such a case, shouldn't ACLs also be overridden?  Can
we handle that in any semi-portable fashion?  Assuming we do attempt
to override ACLs, do we destroy portability of git (I don't know a
whole lot about ACLs, so I don't know, but I thought they varied a lot
between systems) -- and initiate a never-ending treadmill of needs to
handle more kinds of ACLs?  If we don't try to overrule ACLs, doesn't
that defeat your whole argument that "git clean" (with various levels
of forcing) is about trying to do as much as possible to put the
repository in a clean state?  Ignoring ACLs for a minute, what if we
attempt to override the permissions by marking directories as
writable, but find out the user isn't the owner of the directory and
thus can't change its permissions?  Do we try harder by attempting to
invoke chmod under sudo to see if we can override the permissions?  At
what point do we give up?  What's the clear line...and why does that
line not just get drawn at not changing permissions at all?

>  because there's no obvious reason why behavior of "git clean", whose purpose is to put a repository back to a pristine state, should always behave the same way as "rm", whose purpose is to delete a specified set of files, and more to the point there is no reason why they should have no option to behave differently. I am not arguing for the default behavior of "git clean" to change, since that could be seen as a breaking change. Also, they are already different. "git clean" will remove a read-only file. "rm" will not (without confirmation), although "rm -f" will.

Here you've changed the goalposts slightly.  I didn't compare "git
clean" to "rm", I compared "git clean -fd" to "rm -rf" on untracked
directories.

However, you are right that even those two aren't the same; "rm -rf"
will recurse into a git submodule and delete it while "git clean -fd"
won't.  In other words, "git clean -fd" avoids bringing the repository
back to a pristine state in a certain circumstance; it's actually more
cautious than rm.


I'm sympathetic to "golang does a dumb thing I don't like" (the exact
situation you hilight has bothered me before too), but it's not clear
at all to me where or how git might help you cope with that.  To me,
golang did a dumb and you have to manually undo the dumb.  If you want
something smarter on the git side instead of the golang side, all the
stuff I mentioned above are things that would need to be addressed in
any possible solution that I'm just not seeing.


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 22:26     ` Junio C Hamano
@ 2020-02-20 23:52       ` Adam Milazzo
  2020-02-21  0:42         ` Elijah Newren
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Milazzo @ 2020-02-20 23:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren, git@vger.kernel.org

> Simply because that is how users would expect how the world works (iow, model things after what they are already familiar with).

This seems to be an avoidance of my actual arguments about 1) the purpose of "git clean" and what behavior best matches it, and 2) the violation of the general principle that if a tool invoked programmatically can fail, then there should be a reasonable way for users to avoid the failure if possible. But my response is:

First, there is no obvious choice for what other tool to model "git clean" on, even assuming that it should be so modeled. This goes back to the purpose of "git clean". Is it just a recursive delete? Or it bringing the directory tree back to a certain state? I'd argue the latter, and if we want point to existing tools I'd point to rsync, which has no problem deleting files from read-only directories if it's needed to bring a directory tree to the desired state. It doesn't even give a warning about it.

Second, I doubt anybody here actually knows (i.e. has data demonstrating) that users expect 'git clean' to behave like 'rm'. Also, I am a user, and it is not what _I_ expect. (And since some people here seem keen to dismiss what I say based on an assumption of ignorance, I've been programming for 30 years, using GNU/Linux, BSD, and other UNIX-like systems for almost 20 years, and using various source control systems for about as long. Not that that should carry any intrinsic weight in this discussion.)

Comparing to "rm" again, there is an easy way for users of "rm" to avoid the error. Simply replace "rm -rf X" with "chmod -R u+w X; rm -rf X". What is the comparable workaround with "git clean"? There is none that I'm aware of, and that's perhaps the main reason why it would be useful for "git clean" to be able to handle it. If there is a reasonable workaround, what is it? The best simple workarounds I've been able to come up with are:

* For "git clean -fd": git status -s -uall | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fd
* For "git clean -fdx": git status -s -uall --ignored | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fdx
* For "git clean -fX": ??
* For "git clean -f": ??

These are not reliable because there are various conditions where they fail (including ours), so I'm not sure they are viable approaches except in certain special cases. It's possible to handle all the possibilities with custom scripting, but the workarounds would become quite complex.

So I ask again, if "git clean" won't have any option to handle it like rsync does, what is the workaround that can be placed in a script to get the same behavior? And if there is no reasonable workaround, perhaps it is a useful feature to have "git clean" try a little harder to delete the files, or have an option to do so?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 23:52       ` Adam Milazzo
@ 2020-02-21  0:42         ` Elijah Newren
  0 siblings, 0 replies; 10+ messages in thread
From: Elijah Newren @ 2020-02-21  0:42 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

Hi Adam,

On Thu, Feb 20, 2020 at 3:52 PM Adam Milazzo <Adam.Milazzo@microsoft.com> wrote:
>
> > Simply because that is how users would expect how the world works (iow, model things after what they are already familiar with).
>
> This seems to be an avoidance of my actual arguments about 1) the purpose of "git clean" and what behavior best matches it, and 2) the violation of the general principle that if a tool invoked programmatically can fail, then there should be a reasonable way for users to avoid the failure if possible. But my response is:
>
> First, there is no obvious choice for what other tool to model "git clean" on, even assuming that it should be so modeled. This goes back to the purpose of "git clean". Is it just a recursive delete? Or it bringing the directory tree back to a certain state? I'd argue the latter, and if we want point to existing tools I'd point to rsync, which has no problem deleting files from read-only directories if it's needed to bring a directory tree to the desired state. It doesn't even give a warning about it.

I'm very sympathetic to the fact that "git clean" behavior might not
be optimal or even well defined[1][2][3].  I've recently done work in
the area, including even changing the existing behavior of some
commands, based on arguments about what should be correct behavior.
Some of that work languished for a year and a half, despite fixing
known bugs, because there were edge cases where I couldn't tell what
correct behavior was and no one else seemed to be able to answer
either.

[1] https://lore.kernel.org/git/20190917163504.14566-1-newren@gmail.com/
[2] https://lore.kernel.org/git/pull.676.v5.git.git.1576790906.gitgitgadget@gmail.com/
[3] https://lore.kernel.org/git/pull.692.v3.git.git.1579206117.gitgitgadget@gmail.com/

If you want to do something similar, I think you need to provide good
rationale for not only what you are trying to achieve, but explain the
edge cases and how to address probable future similar requests in a
way that make sense to someone who might try to implement it.  If I
were to try to implement your suggestion, I'm saying that your
descriptions are not at all clear to me in terms of how I should
handle edge cases and future related improvements that folks ask for,
not even if this is implemented as a new option.

For example, you talk about bringing the directory tree back to a
certain state -- does that mean git clean should also run 'git reset
--hard'?  I need a more precise model/description...

> Second, I doubt anybody here actually knows (i.e. has data demonstrating) that users expect 'git clean' to behave like 'rm'. Also, I am a user, and it is not what _I_ expect. (And since some people here seem keen to dismiss what I say based on an assumption of ignorance, I've been programming for 30 years, using GNU/Linux, BSD, and other UNIX-like systems for almost 20 years, and using various source control systems for about as long. Not that that should carry any intrinsic weight in this discussion.)
>
> Comparing to "rm" again, there is an easy way for users of "rm" to avoid the error. Simply replace "rm -rf X" with "chmod -R u+w X; rm -rf X". What is the comparable workaround with "git clean"? There is none that I'm aware of, and that's perhaps the main reason why it would be useful for "git clean" to be able to handle it. If there is a reasonable workaround, what is it? The best simple workarounds I've been able to come up with are:
>
> * For "git clean -fd": git status -s -uall | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fd
> * For "git clean -fdx": git status -s -uall --ignored | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fdx
> * For "git clean -fX": ??
> * For "git clean -f": ??

For every single case, why not just "chmod -R u+w $toplevel_dir"
followed by the git clean command in question, much like you did with
rm?

> These are not reliable because there are various conditions where they fail (including ours), so I'm not sure they are viable approaches except in certain special cases. It's possible to handle all the possibilities with custom scripting, but the workarounds would become quite complex.
>
> So I ask again, if "git clean" won't have any option to handle it like rsync does, what is the workaround that can be placed in a script to get the same behavior? And if there is no reasonable workaround, perhaps it is a useful feature to have "git clean" try a little harder to delete the files, or have an option to do so?

I think the single simple recursive chmod I mentioned above is a
reasonable workaround.

But since you are bringing up rsync as a comparison point...rsync also
affects ACLs, xattrs, devices and other special files, etc.  So, how
much harder is a little harder?  Is there a good mental model for that
being the right amount of harder, or do we just keep extending it
every time the command fails to clean something that users think we
could have wiped out?


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 23:13     ` Elijah Newren
@ 2020-02-21  1:48       ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2020-02-21  1:48 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Adam Milazzo, git@vger.kernel.org

Elijah Newren <newren@gmail.com> writes:

> should be and what it should do.  Your "put the repository back to a
> pristine state" isn't enough to really go on from an implementation.

Yeah, where does this "pristine state" come from?  Do we need to fix
the "git clean" documentation to reduce such a confusion (I didn't
check to see if the current text is prone to be misunderstood as
such yet).  "git clean" is to remove crufts, isn't it?  The
definition of "cruft" being untracked junk that are marked
expendable (aka "ignored")?  And it is not "remove crurfts no matter
what".

> angle from my previous response in what you've addressed here.  Let's
> say we did implement a new flag that said we should override standard
> permissions.  In such a case, shouldn't ACLs also be overridden?  ...
> what point do we give up?  What's the clear line...and why does that
> line not just get drawn at not changing permissions at all?

Yup.  Elsewhere in the thread you gave "rm -fr" not removing an
unwritable subdirectory, and that shows where others before us drew
the line that has been accepted by the end users.

> I'm sympathetic to "golang does a dumb thing I don't like" (the exact
> situation you hilight has bothered me before too), but it's not clear
> at all to me where or how git might help you cope with that.  To me,
> golang did a dumb and you have to manually undo the dumb.  If you want
> something smarter on the git side instead of the golang side, all the
> stuff I mentioned above are things that would need to be addressed in
> any possible solution that I'm just not seeing.

Amen.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-21  1:08 ` Elijah Newren
@ 2020-02-21  3:06   ` Adam Milazzo
  2020-02-21  5:38     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Milazzo @ 2020-02-21  3:06 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, git@vger.kernel.org

> Hehe, but you left an opening for your turn around to be turned around...  Git does not actually track standard Unix file permissions.
> It _only_ tracks the executable bit (and not a separate one for user & group & other, but just an overall is-executable or not).  Of particular note for this discussion is that it does not track the writable bit.  So, wouldn't it be in line with the way Git already behaves to not touch the writable permission?  :-)

Well, you got me there. In various places it will show file modes like 100644 or 100755 and it'll include permission changes in diffs, but it does seem to be faking it (in that it only ever shows 644 or 755 regardless of the actual permissions)! That furthermore implies that Git will always check out files with only one of two permissions (either 666 or 777, filtered through the umask), and that means I don't in fact have to worry about chmod -R u+w at the root of the repository messing up the permissions of tracked files.

That being the case, my proposed feature is not really necessary - perhaps nice to have in some edge cases, but probably not worth the implementation effort.

I wonder now why Git makes an effort of appearing to represent a full set of Unix file permissions. Perhaps the idea was to leave the door open to storing them in the future...

> This is some good food for thought, especially the bit about "[if] you are going to delete the parent directory anwyay".  Hmm...

Well, personally I'd prefer it to do it regardless of whether the parent directory is going to be deleted, so it doesn't matter where the untracked file is, but again I withdraw the feature request given that Git doesn't actually store any writable permission bits so there's no problem if I forcibly rewrite them throughout the whole repository.

And with that I apologize for anyone's time that I wasted.

-- Adam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-21  3:06   ` [EXTERNAL] " Adam Milazzo
@ 2020-02-21  5:38     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2020-02-21  5:38 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Elijah Newren, Junio C Hamano, git@vger.kernel.org

On Fri, Feb 21, 2020 at 03:06:13AM +0000, Adam Milazzo wrote:

> I wonder now why Git makes an effort of appearing to represent a full
> set of Unix file permissions. Perhaps the idea was to leave the door
> open to storing them in the future...

I think to some degree it's the other way around. The very earliest
versions of Git did track the full mode, but that turned out to be
annoying (you'd record useless changes caused by umasks of different
committers). It was changed in e44794706e (Be much more liberal about
the file mode bits., 2005-04-16), which is quite early on. :)

We do use the mode bits for other things, too. Subtrees are 040000,
symlinks are 120000, and submodule "gitlinks" are 160000.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-21  5:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 18:55 BUG: git clean -d cannot remove files from read-only directories Adam Milazzo
2020-02-20 19:45 ` Elijah Newren
2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
2020-02-20 22:26     ` Junio C Hamano
2020-02-20 23:52       ` Adam Milazzo
2020-02-21  0:42         ` Elijah Newren
2020-02-20 23:13     ` Elijah Newren
2020-02-21  1:48       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2020-02-21  0:52 Adam Milazzo
2020-02-21  1:08 ` Elijah Newren
2020-02-21  3:06   ` [EXTERNAL] " Adam Milazzo
2020-02-21  5:38     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).