git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* BUG: git clean -d cannot remove files from read-only directories
@ 2020-02-20 18:27 Adam Milazzo
  2020-02-20 18:32 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Adam Milazzo @ 2020-02-20 18:27 UTC (permalink / raw)
  To: git@vger.kernel.org

Repro steps:
1. git init 
2. mkdir d
3. touch d/a
4. chmod -w d
5. git clean -fd

Actual result:
Git doesn't remove anything, saying "warning: failed to remove d/a".

Expected result:
Git should remove the subdirectory 'd' along with its contents. Note that git can remove a read-only file (touch b; chmod -w b; git clean -f) with no problem.

Environment:
git version 2.7.4 on GNU/Linux

Why this is important:
* This has a significant impact in real scenarios: the go language, when using modules, will download referenced modules and place them in read-only directories inside a cache directory, potentially inside the git repository. These cached modules can't be cleaned up by git clean. (Furthermore, git clean then returns a failure status code, which causes our build system to fail. I don't want to ignore the failure code.)

* There is no simple and reliable way to find out which directories I would need to chmod u+w in order to allow git to delete their contents. I'd have to effectively replicate git's algorithm for determining tracked files. "git clean" is invoked at the root of the repository and the repository contains many projects worked on by different teams, so there's no fixed list of things I can manually chmod to make "git clean" work reliably.

* It seems inconsistent for "git clean" to be able to remove read-only files but not files from read-only directories.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 18:27 Adam Milazzo
@ 2020-02-20 18:32 ` Junio C Hamano
  2020-02-20 18:47   ` Junio C Hamano
  2020-02-20 18:46 ` Daniel Knittl-Frank
  2020-02-21  1:45 ` brian m. carlson
  2 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-02-20 18:32 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: git@vger.kernel.org

Adam Milazzo <Adam.Milazzo@microsoft.com> writes:

> Repro steps:
> 1. git init 
> 2. mkdir d
> 3. touch d/a
> 4. chmod -w d
> 5. git clean -fd
>
> Actual result:
> Git doesn't remove anything, saying "warning: failed to remove d/a".
>
> Expected result:
> Git should remove the subdirectory 'd' along with its
> contents. Note that git can remove a read-only file (touch b;
> chmod -w b; git clean -f) with no problem.

It is how UNIX-like filesystem works, isn't it?

As long as a directory D is writable/executable, any regular file
inside it can be removed regardless of the perm-bits of the file.

	mkdir d
	touch d/a
	chmod -w d
	rm d/a

would not let you remove the file d/a from d/, exactly because you
cannot modify d/ (it is not writable).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 18:27 Adam Milazzo
  2020-02-20 18:32 ` Junio C Hamano
@ 2020-02-20 18:46 ` Daniel Knittl-Frank
  2020-02-21  1:45 ` brian m. carlson
  2 siblings, 0 replies; 16+ messages in thread
From: Daniel Knittl-Frank @ 2020-02-20 18:46 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: git@vger.kernel.org

Hi Adam,

On Thu, Feb 20, 2020 at 7:27 PM Adam Milazzo <Adam.Milazzo@microsoft.com> wrote:
>
> Repro steps:
> 1. git init
> 2. mkdir d
> 3. touch d/a
> 4. chmod -w d
> 5. git clean -fd
>
> Actual result:
> Git doesn't remove anything, saying "warning: failed to remove d/a".
>
>
> Expected result:
> Git should remove the subdirectory 'd' along with its contents. Note that git can remove a read-only file (touch b; chmod -w b; git clean -f) with no problem.

this is not a limitation of Git, but of how Linux Filesystems work.
When you delete a file, you do not modify the file, but you change the
directory which contains the file. In other words, if you have a
directory which is write protected (you do not have the write
permission), you cannot add, remove, or rename files. You may however
still modify the content existing files.

Your repro will yield the same result when executed without Git, i.e.
plain shell commands:

$ mkdir d ; touch d/a ; chmod a-w d
$ rm -rf d
rm: cannot remove 'd/a': Permission denied

If the write permission were removed before touching the file, this
would already fail:

$ mkdir d ; chmod a-w d
$ touch d/a
touch: cannot touch 'd/a': Permission denied

In summary: this is expected and I doubt Git can do much in such a
case. After all, the directory is marked as read-only, so why should
Git be able to write it? :)


> […]
>
> * It seems inconsistent for "git clean" to be able to remove read-only files but not files from read-only directories.

Again, deleting a file will modify its containing directory, not the
file itself. Making a file read-only does not protect it from being
renamed or deleted, that's what directory permissions are for.

- Daniel

-- 
typed with http://neo-layout.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 18:32 ` Junio C Hamano
@ 2020-02-20 18:47   ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-02-20 18:47 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: git@vger.kernel.org

Junio C Hamano <gitster@pobox.com> writes:

> Adam Milazzo <Adam.Milazzo@microsoft.com> writes:
>
>> Repro steps:
>> 1. git init 
>> 2. mkdir d
>> 3. touch d/a
>> 4. chmod -w d
>> 5. git clean -fd
>>
>> Actual result:
>> Git doesn't remove anything, saying "warning: failed to remove d/a".
>>
>> Expected result:
>> Git should remove the subdirectory 'd' along with its
>> contents. Note that git can remove a read-only file (touch b;
>> chmod -w b; git clean -f) with no problem.
>
> It is how UNIX-like filesystem works, isn't it?
>
> As long as a directory D is writable/executable, any regular file
> inside it can be removed regardless of the perm-bits of the file.
>
> 	mkdir d
> 	touch d/a
> 	chmod -w d
> 	rm d/a
>
> would not let you remove the file d/a from d/, exactly because you
> cannot modify d/ (it is not writable).

I realize that the above is probably a bit too terse for those who
are not familiar with the UNIX/POSIX-like filesystems.

A directory is a record of what files (and subdirectories) are in
it.  If you add a file to or remove a file from a directory, you
are updating that record---so you need to be able to write to the
directory.  You do not have to be able to write (or even to read for
that matter) to the file you are adding or removing to the directory,
because adding or removing a file from a directory involves only
updating the record kept in the directory about what files and
subdirectories are in that directory (hence you need to be able to
write to the directory---but perm bits on the files you are adding
or removing do not play a role in this operation).

A collorary is that modifying an existing file in a directory does
not have to change what is recorded in the directory---what files
are in the directory does not change.  You only need to be able to
write the file, so you can edit a file in a read-only directory.  

Note that some editors, however, implement an "edit" as "rename the
original file as a backup, create a new copy of a file, and edit
that new copy".  With such an editor, you cannot "edit" a writable
file in a read-only directory and the reason would by now be obvious
to you once you understand the explanation above.  Both renaming the
original and creating the new copy would be updating what files are
in the directory, requiring you to be able to write to the directory.

Hope this helps.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
@ 2020-02-20 18:55 Adam Milazzo
  2020-02-20 19:45 ` Elijah Newren
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Milazzo @ 2020-02-20 18:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

> It is how UNIX-like filesystem works, isn't it?

Sure, but it doesn't have to be how "git clean -d" works. In particular, "git clean -ffd" could be more forceful, or there could be another option or a third level of force beyond the current two levels.

It really comes back to this question: "How can I avoid the failure, given that I am running 'git clean' from a script and not interactively?" The only answer I could find that's not unreasonable is to parse the text output of 'git status -s' to find the untracked directories and then run 'chmod -R u+w' on each of those directories before running 'git clean -d'." I can do that, but I still think a "force" (or other) flag that really forces the cleanup would be preferable, especially given that this isn't a completely idiosyncratic scenario but one that will happen more and more as go modules are adopted (unless go is changed to stop putting them in read-only directories).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
@ 2020-02-20 19:29 Adam Milazzo
  0 siblings, 0 replies; 16+ messages in thread
From: Adam Milazzo @ 2020-02-20 19:29 UTC (permalink / raw)
  To: Junio C Hamano, Daniel Knittl-Frank; +Cc: git@vger.kernel.org

Thank you for your explanations. I don't think "git clean" must necessarily mirror the behavior of the underlying filesystem and must be free of any options that implement a different behavior.

What is the purpose of "git clean"? Broadly, it's "Perform a mass delete of untracked (or, with -X, ignored) files to put my repository back to a clean state". I would argue that paying too much respect to the permission bits within untracked directories that we're planning to delete anyway goes against this purpose. Consider: why pay such respect? If the answer is "to protect the user from accidental deletions", we already have interactive and dry run modes to do that, and at a logical level, one might expect a read-only file to also be protected. (For example, "rm" will prompt for confirmation before deleting a read-only file, though "rm -f" will not.) If the answer is "because that's how UNIX file systems works", then I'd wonder why we should be a slave to that. (If the unlink syscall returns EPERM or whatever, you can check if the permissions on the parent directory may be the cause and then change them and retry if 'git clean' would be deleting the parent anyway.) In short, when a user says to go ahead with the mass delete, and especially when they say to "force it", why not do so?

-----Original Message-----
From: Junio C Hamano <gitster@pobox.com> 
Sent: Thursday, February 20, 2020 10:48 AM
To: Adam Milazzo <Adam.Milazzo@microsoft.com>
Cc: git@vger.kernel.org
Subject: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories

Junio C Hamano <gitster@pobox.com> writes:

> Adam Milazzo <Adam.Milazzo@microsoft.com> writes:
>
>> Repro steps:
>> 1. git init
>> 2. mkdir d
>> 3. touch d/a
>> 4. chmod -w d
>> 5. git clean -fd
>>
>> Actual result:
>> Git doesn't remove anything, saying "warning: failed to remove d/a".
>>
>> Expected result:
>> Git should remove the subdirectory 'd' along with its contents. Note 
>> that git can remove a read-only file (touch b; chmod -w b; git clean 
>> -f) with no problem.
>
> It is how UNIX-like filesystem works, isn't it?
>
> As long as a directory D is writable/executable, any regular file 
> inside it can be removed regardless of the perm-bits of the file.
>
> 	mkdir d
> 	touch d/a
> 	chmod -w d
> 	rm d/a
>
> would not let you remove the file d/a from d/, exactly because you 
> cannot modify d/ (it is not writable).

I realize that the above is probably a bit too terse for those who are not familiar with the UNIX/POSIX-like filesystems.

A directory is a record of what files (and subdirectories) are in it.  If you add a file to or remove a file from a directory, you are updating that record---so you need to be able to write to the directory.  You do not have to be able to write (or even to read for that matter) to the file you are adding or removing to the directory, because adding or removing a file from a directory involves only updating the record kept in the directory about what files and subdirectories are in that directory (hence you need to be able to write to the directory---but perm bits on the files you are adding or removing do not play a role in this operation).

A collorary is that modifying an existing file in a directory does not have to change what is recorded in the directory---what files are in the directory does not change.  You only need to be able to write the file, so you can edit a file in a read-only directory.  

Note that some editors, however, implement an "edit" as "rename the original file as a backup, create a new copy of a file, and edit that new copy".  With such an editor, you cannot "edit" a writable file in a read-only directory and the reason would by now be obvious to you once you understand the explanation above.  Both renaming the original and creating the new copy would be updating what files are in the directory, requiring you to be able to write to the directory.

Hope this helps.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 18:55 BUG: git clean -d cannot remove files from read-only directories Adam Milazzo
@ 2020-02-20 19:45 ` Elijah Newren
  2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
  0 siblings, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2020-02-20 19:45 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

On Thu, Feb 20, 2020 at 10:58 AM Adam Milazzo
<Adam.Milazzo@microsoft.com> wrote:
>
> > It is how UNIX-like filesystem works, isn't it?
>
> Sure, but it doesn't have to be how "git clean -d" works. In particular, "git clean -ffd" could be more forceful, or there could be another option or a third level of force beyond the current two levels.
>
> It really comes back to this question: "How can I avoid the failure, given that I am running 'git clean' from a script and not interactively?" The only answer I could find that's not unreasonable is to parse the text output of 'git status -s' to find the untracked directories and then run 'chmod -R u+w' on each of those directories before running 'git clean -d'." I can do that, but I still think a "force" (or other) flag that really forces the cleanup would be preferable, especially given that this isn't a completely idiosyncratic scenario but one that will happen more and more as go modules are adopted (unless go is changed to stop putting them in read-only directories).


Tying permission override to e.g -ffd would be a really bad idea.
People would start adopting that for the permission override reason,
then someone is going to accidentally nuke a git submodule with
unpushed changes.

Tying it to -fd doesn't have that particular problem, but I'm worried
a bit about a slippery slope if we do it.  If we say that "git clean
-fd" should check and modify directory permission bits, should it also
check and modify any necessary ACLs?  What if the user in question
doesn't have perms, but the user in question is in sudoers -- should
git try to see if they can run chown or chattr under sudo too?

Also, another way to look at this; currently 'git clean -fd' behaves
the same on untracked directories as 'rm -rf' does and I think that's
a good model for how to behave.  Why should they be different?

$ git clean -fd unwritable-dir/
warning: failed to remove unwritable-dir/a: Permission denied
warning: failed to remove unwritable-dir/b: Permission denied
warning: failed to remove unwritable-dir/c: Permission denied
$ rm -rf unwritable-dir/
rm: cannot remove 'unwritable-dir/a': Permission denied
rm: cannot remove 'unwritable-dir/b': Permission denied
rm: cannot remove 'unwritable-dir/c': Permission denied

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 19:45 ` Elijah Newren
@ 2020-02-20 20:05   ` Adam Milazzo
  2020-02-20 22:26     ` Junio C Hamano
  2020-02-20 23:13     ` Elijah Newren
  0 siblings, 2 replies; 16+ messages in thread
From: Adam Milazzo @ 2020-02-20 20:05 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, git@vger.kernel.org

> Tying permission override to e.g -ffd would be a really bad idea. People would start adopting that for the permission override reason, then someone is going to accidentally nuke a git submodule with unpushed changes.

Fair enough. I'm not arguing that it must necessarily be any kind of overloading of -f. It could just as well be a separate option. (Personally I'd say overloading -f to make "-f -f" mean "also delete untracked git repositories" was the bad idea, since that is not the obvious and intuitive meaning of "--force", whereas "try harder to delete files you would normally delete" _is_ the intuitive meaning, but the history can't be changed.)

> Also, another way to look at this; currently 'git clean -fd' behaves the same on untracked directories as 'rm -rf' does and I think that's a good model for how to behave.  Why should they be different?

I would ask "why should they be the same?", because there's no obvious reason why behavior of "git clean", whose purpose is to put a repository back to a pristine state, should always behave the same way as "rm", whose purpose is to delete a specified set of files, and more to the point there is no reason why they should have no option to behave differently. I am not arguing for the default behavior of "git clean" to change, since that could be seen as a breaking change. Also, they are already different. "git clean" will remove a read-only file. "rm" will not (without confirmation), although "rm -f" will.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
@ 2020-02-20 22:26     ` Junio C Hamano
  2020-02-20 23:52       ` Adam Milazzo
  2020-02-20 23:13     ` Elijah Newren
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-02-20 22:26 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Elijah Newren, git@vger.kernel.org

Adam Milazzo <Adam.Milazzo@microsoft.com> writes:

>> Also, another way to look at this; currently 'git clean -fd'
>> behaves the same on untracked directories as 'rm -rf' does and I
>> think that's a good model for how to behave.  Why should they be
>> different?
>
> I would ask "why should they be the same?"

Simply because that is how users would expect how the world works
(iow, model things after what they are already familiar with).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
  2020-02-20 22:26     ` Junio C Hamano
@ 2020-02-20 23:13     ` Elijah Newren
  2020-02-21  1:48       ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2020-02-20 23:13 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

On Thu, Feb 20, 2020 at 12:05 PM Adam Milazzo
<Adam.Milazzo@microsoft.com> wrote:
>
> > Tying permission override to e.g -ffd would be a really bad idea. People would start adopting that for the permission override reason, then someone is going to accidentally nuke a git submodule with unpushed changes.
>
> Fair enough. I'm not arguing that it must necessarily be any kind of overloading of -f. It could just as well be a separate option. (Personally I'd say overloading -f to make "-f -f" mean "also delete untracked git repositories" was the bad idea, since that is not the obvious and intuitive meaning of "--force", whereas "try harder to delete files you would normally delete" _is_ the intuitive meaning, but the history can't be changed.)

Yeah, arguing to change existing behavior (as you started with by
labelling the thread with "BUG") will probably get you a lot of
pushback, but let me address your new angle of suggesting a new
option...

> > Also, another way to look at this; currently 'git clean -fd' behaves the same on untracked directories as 'rm -rf' does and I think that's a good model for how to behave.  Why should they be different?
>
> I would ask "why should they be the same?",

I think it matches user expectation of how the filesystem works.  But
if we did want to implement a new flag, there's a question of what it
should be and what it should do.  Your "put the repository back to a
pristine state" isn't enough to really go on from an implementation.
Clearly you don't mean that it should also do a 'reset --hard' --
despite the fact that running a reset fits this definition, but even
in terms of untracked files you've still left out the "slippery slope"
angle from my previous response in what you've addressed here.  Let's
say we did implement a new flag that said we should override standard
permissions.  In such a case, shouldn't ACLs also be overridden?  Can
we handle that in any semi-portable fashion?  Assuming we do attempt
to override ACLs, do we destroy portability of git (I don't know a
whole lot about ACLs, so I don't know, but I thought they varied a lot
between systems) -- and initiate a never-ending treadmill of needs to
handle more kinds of ACLs?  If we don't try to overrule ACLs, doesn't
that defeat your whole argument that "git clean" (with various levels
of forcing) is about trying to do as much as possible to put the
repository in a clean state?  Ignoring ACLs for a minute, what if we
attempt to override the permissions by marking directories as
writable, but find out the user isn't the owner of the directory and
thus can't change its permissions?  Do we try harder by attempting to
invoke chmod under sudo to see if we can override the permissions?  At
what point do we give up?  What's the clear line...and why does that
line not just get drawn at not changing permissions at all?

>  because there's no obvious reason why behavior of "git clean", whose purpose is to put a repository back to a pristine state, should always behave the same way as "rm", whose purpose is to delete a specified set of files, and more to the point there is no reason why they should have no option to behave differently. I am not arguing for the default behavior of "git clean" to change, since that could be seen as a breaking change. Also, they are already different. "git clean" will remove a read-only file. "rm" will not (without confirmation), although "rm -f" will.

Here you've changed the goalposts slightly.  I didn't compare "git
clean" to "rm", I compared "git clean -fd" to "rm -rf" on untracked
directories.

However, you are right that even those two aren't the same; "rm -rf"
will recurse into a git submodule and delete it while "git clean -fd"
won't.  In other words, "git clean -fd" avoids bringing the repository
back to a pristine state in a certain circumstance; it's actually more
cautious than rm.


I'm sympathetic to "golang does a dumb thing I don't like" (the exact
situation you hilight has bothered me before too), but it's not clear
at all to me where or how git might help you cope with that.  To me,
golang did a dumb and you have to manually undo the dumb.  If you want
something smarter on the git side instead of the golang side, all the
stuff I mentioned above are things that would need to be addressed in
any possible solution that I'm just not seeing.


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 22:26     ` Junio C Hamano
@ 2020-02-20 23:52       ` Adam Milazzo
  2020-02-21  0:42         ` Elijah Newren
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Milazzo @ 2020-02-20 23:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren, git@vger.kernel.org

> Simply because that is how users would expect how the world works (iow, model things after what they are already familiar with).

This seems to be an avoidance of my actual arguments about 1) the purpose of "git clean" and what behavior best matches it, and 2) the violation of the general principle that if a tool invoked programmatically can fail, then there should be a reasonable way for users to avoid the failure if possible. But my response is:

First, there is no obvious choice for what other tool to model "git clean" on, even assuming that it should be so modeled. This goes back to the purpose of "git clean". Is it just a recursive delete? Or it bringing the directory tree back to a certain state? I'd argue the latter, and if we want point to existing tools I'd point to rsync, which has no problem deleting files from read-only directories if it's needed to bring a directory tree to the desired state. It doesn't even give a warning about it.

Second, I doubt anybody here actually knows (i.e. has data demonstrating) that users expect 'git clean' to behave like 'rm'. Also, I am a user, and it is not what _I_ expect. (And since some people here seem keen to dismiss what I say based on an assumption of ignorance, I've been programming for 30 years, using GNU/Linux, BSD, and other UNIX-like systems for almost 20 years, and using various source control systems for about as long. Not that that should carry any intrinsic weight in this discussion.)

Comparing to "rm" again, there is an easy way for users of "rm" to avoid the error. Simply replace "rm -rf X" with "chmod -R u+w X; rm -rf X". What is the comparable workaround with "git clean"? There is none that I'm aware of, and that's perhaps the main reason why it would be useful for "git clean" to be able to handle it. If there is a reasonable workaround, what is it? The best simple workarounds I've been able to come up with are:

* For "git clean -fd": git status -s -uall | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fd
* For "git clean -fdx": git status -s -uall --ignored | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fdx
* For "git clean -fX": ??
* For "git clean -f": ??

These are not reliable because there are various conditions where they fail (including ours), so I'm not sure they are viable approaches except in certain special cases. It's possible to handle all the possibilities with custom scripting, but the workarounds would become quite complex.

So I ask again, if "git clean" won't have any option to handle it like rsync does, what is the workaround that can be placed in a script to get the same behavior? And if there is no reasonable workaround, perhaps it is a useful feature to have "git clean" try a little harder to delete the files, or have an option to do so?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 23:52       ` Adam Milazzo
@ 2020-02-21  0:42         ` Elijah Newren
  0 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2020-02-21  0:42 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

Hi Adam,

On Thu, Feb 20, 2020 at 3:52 PM Adam Milazzo <Adam.Milazzo@microsoft.com> wrote:
>
> > Simply because that is how users would expect how the world works (iow, model things after what they are already familiar with).
>
> This seems to be an avoidance of my actual arguments about 1) the purpose of "git clean" and what behavior best matches it, and 2) the violation of the general principle that if a tool invoked programmatically can fail, then there should be a reasonable way for users to avoid the failure if possible. But my response is:
>
> First, there is no obvious choice for what other tool to model "git clean" on, even assuming that it should be so modeled. This goes back to the purpose of "git clean". Is it just a recursive delete? Or it bringing the directory tree back to a certain state? I'd argue the latter, and if we want point to existing tools I'd point to rsync, which has no problem deleting files from read-only directories if it's needed to bring a directory tree to the desired state. It doesn't even give a warning about it.

I'm very sympathetic to the fact that "git clean" behavior might not
be optimal or even well defined[1][2][3].  I've recently done work in
the area, including even changing the existing behavior of some
commands, based on arguments about what should be correct behavior.
Some of that work languished for a year and a half, despite fixing
known bugs, because there were edge cases where I couldn't tell what
correct behavior was and no one else seemed to be able to answer
either.

[1] https://lore.kernel.org/git/20190917163504.14566-1-newren@gmail.com/
[2] https://lore.kernel.org/git/pull.676.v5.git.git.1576790906.gitgitgadget@gmail.com/
[3] https://lore.kernel.org/git/pull.692.v3.git.git.1579206117.gitgitgadget@gmail.com/

If you want to do something similar, I think you need to provide good
rationale for not only what you are trying to achieve, but explain the
edge cases and how to address probable future similar requests in a
way that make sense to someone who might try to implement it.  If I
were to try to implement your suggestion, I'm saying that your
descriptions are not at all clear to me in terms of how I should
handle edge cases and future related improvements that folks ask for,
not even if this is implemented as a new option.

For example, you talk about bringing the directory tree back to a
certain state -- does that mean git clean should also run 'git reset
--hard'?  I need a more precise model/description...

> Second, I doubt anybody here actually knows (i.e. has data demonstrating) that users expect 'git clean' to behave like 'rm'. Also, I am a user, and it is not what _I_ expect. (And since some people here seem keen to dismiss what I say based on an assumption of ignorance, I've been programming for 30 years, using GNU/Linux, BSD, and other UNIX-like systems for almost 20 years, and using various source control systems for about as long. Not that that should carry any intrinsic weight in this discussion.)
>
> Comparing to "rm" again, there is an easy way for users of "rm" to avoid the error. Simply replace "rm -rf X" with "chmod -R u+w X; rm -rf X". What is the comparable workaround with "git clean"? There is none that I'm aware of, and that's perhaps the main reason why it would be useful for "git clean" to be able to handle it. If there is a reasonable workaround, what is it? The best simple workarounds I've been able to come up with are:
>
> * For "git clean -fd": git status -s -uall | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fd
> * For "git clean -fdx": git status -s -uall --ignored | grep -E '^\?\?.*/$' | cut -c 4- | xargs -r chmod -R u+w; git clean -fdx
> * For "git clean -fX": ??
> * For "git clean -f": ??

For every single case, why not just "chmod -R u+w $toplevel_dir"
followed by the git clean command in question, much like you did with
rm?

> These are not reliable because there are various conditions where they fail (including ours), so I'm not sure they are viable approaches except in certain special cases. It's possible to handle all the possibilities with custom scripting, but the workarounds would become quite complex.
>
> So I ask again, if "git clean" won't have any option to handle it like rsync does, what is the workaround that can be placed in a script to get the same behavior? And if there is no reasonable workaround, perhaps it is a useful feature to have "git clean" try a little harder to delete the files, or have an option to do so?

I think the single simple recursive chmod I mentioned above is a
reasonable workaround.

But since you are bringing up rsync as a comparison point...rsync also
affects ACLs, xattrs, devices and other special files, etc.  So, how
much harder is a little harder?  Is there a good mental model for that
being the right amount of harder, or do we just keep extending it
every time the command fails to clean something that users think we
could have wiped out?


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
@ 2020-02-21  0:52 Adam Milazzo
  2020-02-21  1:08 ` Elijah Newren
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Milazzo @ 2020-02-21  0:52 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, git@vger.kernel.org

The fragmenting and duplication of this discussion via email is unfortunate, but I'm not sure what can be done about that.

> Yeah, arguing to change existing behavior (as you started with by labelling the thread with "BUG") will probably get you a lot of pushback, but let me address your new angle of suggesting a new option...

Sorry if the subject prefix annoyed people, but I wasn't trying to. There's no template that I could find for bug reports... sorry, feature suggestions. :-)

I appreciate your responding to my arguments here and you make some good objections. I'm not sure at this point that they're insuperable, but let me try to respond.

> Let's say we did implement a new flag that said we should override standard permissions. In such a case, shouldn't ACLs also be overridden?

Maybe, and probably ideally. I suppose it depends in part on the difficulty of implementation. I don't know how other tools, like rsync (which has no problem deleting files from read-only directories), handle ACLs. But I would say that if rsync can do it then Git can do it.

> Can we handle that in any semi-portable fashion? Assuming we do attempt to override ACLs, do we destroy portability of git?

I don't know. I know there are POSIX ACLs, but I'd be surprised if even the POSIX layer on Windows implements them, for example. I've never used a Unix-like system with ACLs. I'd be tempted to say again "do what rsync does", but I don't actually know what that is and don't have an easy way to test it out. I think Unix-like systems with ACLs are very rare, much less than 1% of installations, but of course Windows is ACL-based.

Let me turn your questions around. Git knows about, tracks, and stores standard Unix file permissions. It manipulates them and cares about them. What about ACLs? Does Git apply the same versioning to ACLs as it does to files? If the answer is "no", as I expect, doesn't that mean Git's portability is already "destroyed", or nonexistent? And that therefore it's wrong for Git to even try to maintain Unix file permissions since they don't apply to all operating systems? I could put it another way: since Git knows and cares about and already manipulates Unix file permissions but (presumably) not ACLs, it is not inconsistent with the current design to continue doing exactly that - supporting Unix permissions but not ACLs - in new features.

> If we don't try to overrule ACLs, doesn't that defeat your whole argument that "git clean" (with various levels of forcing) is about trying to do as much as possible to put the repository in a clean state?

I think the discussion of "would it be a good feature?" is fairly independent from the question of "is it easy to implement?". The question of what "git clean" is about is separate from the question of what limitations we might have due to implementation difficulties. You've probably seen the "Known limitations" sections of many man pages. If you can solve 90% of a problem with 10% of the effort, it's usually reasonable to do so. I.e. "don't let perfection be the enemy of good" and all that. And I suspect explicit ACL support is missing from Git generally, so having another feature without explicit ACL support is not a showstopper.

> what if we attempt to override the permissions by marking directories as writable, but find out the user isn't the owner of the directory and thus can't change its permissions?

Well, then you fail. If the user isn't the owner of the thing being deleted, that's a different ballgame.

> Do we try harder by attempting to invoke chmod under sudo to see if we can override the permissions?

Certainly not. :-)  The user can sudo git if they have permission to do such a thing.

> At what point do we give up? What's the clear line...and why does that line not just get drawn at not changing permissions at all?

I would say that first it should be determined whether the feature is useful. If not, then the line should get drawn at not changing permissions. But if it would be useful, then the line should be initially drawn at wherever gives the most "bang for the buck", the 80/20 case, etc. If it turns out that it's an unreasonable effort even to get a minimally useful implementation, then the stance of the Git team should be "It'd be nice to have, but if you want it you'll have to write it yourself and send us a patch." Which is, I think, how more esoteric features like ACLs should be handled, if they are to be handled at all.

To be specific, I propose that if the unlink or rmdir system call returns EPERM, and you are going to delete the parent directory anyway, then you stat the parent directory, see if it's missing write permissions, and if so, try to chmod u+w and retry the deletion of the child. And if any of those steps fail, you print an error message exactly like today. Alternately, and preferably, you do it even if you won't delete the parent directory, but in that case you have to put the permissions back when you're done. I think both of those are simple to implement, but I don't know about Git's implementation.

> Here you've changed the goalposts slightly.  I didn't compare "git clean" to "rm", I compared "git clean -fd" to "rm -rf" on untracked directories.

True, but I only realized that after sending the email.

> In other words, "git clean -fd" avoids bringing the repository back to a pristine state in a certain circumstance; it's actually more cautious than rm.

Which is the main trouble with it. It's very difficult to know exactly what it's going to want to delete, hence the difficulty of fixing the problem myself (where myself is a script invoking "git clean") with any kind of workaround outside Git, and hence my request for it to be handled inside Git.

But I come back to rsync, which I think is perhaps spiritually closer to "git clean" than rm is. It has no problem syncing files even if it has to twiddle some permission bits to do it. And that's convenient.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-21  0:52 Adam Milazzo
@ 2020-02-21  1:08 ` Elijah Newren
  0 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2020-02-21  1:08 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: Junio C Hamano, git@vger.kernel.org

On Thu, Feb 20, 2020 at 4:52 PM Adam Milazzo <Adam.Milazzo@microsoft.com> wrote:
>
> The fragmenting and duplication of this discussion via email is unfortunate, but I'm not sure what can be done about that.
>
> > Yeah, arguing to change existing behavior (as you started with by labelling the thread with "BUG") will probably get you a lot of pushback, but let me address your new angle of suggesting a new option...
>
> Sorry if the subject prefix annoyed people, but I wasn't trying to. There's no template that I could find for bug reports... sorry, feature suggestions. :-)
>
> I appreciate your responding to my arguments here and you make some good objections. I'm not sure at this point that they're insuperable, but let me try to respond.
>
> > Let's say we did implement a new flag that said we should override standard permissions. In such a case, shouldn't ACLs also be overridden?
>
> Maybe, and probably ideally. I suppose it depends in part on the difficulty of implementation. I don't know how other tools, like rsync (which has no problem deleting files from read-only directories), handle ACLs. But I would say that if rsync can do it then Git can do it.
>
> > Can we handle that in any semi-portable fashion? Assuming we do attempt to override ACLs, do we destroy portability of git?
>
> I don't know. I know there are POSIX ACLs, but I'd be surprised if even the POSIX layer on Windows implements them, for example. I've never used a Unix-like system with ACLs. I'd be tempted to say again "do what rsync does", but I don't actually know what that is and don't have an easy way to test it out. I think Unix-like systems with ACLs are very rare, much less than 1% of installations, but of course Windows is ACL-based.

Yeah, I know almost nothing about ACLs so I don't know the answer to
the questions either...

> Let me turn your questions around. Git knows about, tracks, and stores standard Unix file permissions. It manipulates them and cares about them. What about ACLs? Does Git apply the same versioning to ACLs as it does to files? If the answer is "no", as I expect, doesn't that mean Git's portability is already "destroyed", or nonexistent? And that therefore it's wrong for Git to even try to maintain Unix file permissions since they don't apply to all operating systems? I could put it another way: since Git knows and cares about and already manipulates Unix file permissions but (presumably) not ACLs, it is not inconsistent with the current design to continue doing exactly that - supporting Unix permissions but not ACLs - in new features.

Hehe, but you left an opening for your turn around to be turned
around...  Git does not actually track standard Unix file permissions.
It _only_ tracks the executable bit (and not a separate one for user &
group & other, but just an overall is-executable or not).  Of
particular note for this discussion is that it does not track the
writable bit.  So, wouldn't it be in line with the way Git already
behaves to not touch the writable permission?  :-)

> > If we don't try to overrule ACLs, doesn't that defeat your whole argument that "git clean" (with various levels of forcing) is about trying to do as much as possible to put the repository in a clean state?
>
> I think the discussion of "would it be a good feature?" is fairly independent from the question of "is it easy to implement?". The question of what "git clean" is about is separate from the question of what limitations we might have due to implementation difficulties. You've probably seen the "Known limitations" sections of many man pages. If you can solve 90% of a problem with 10% of the effort, it's usually reasonable to do so. I.e. "don't let perfection be the enemy of good" and all that. And I suspect explicit ACL support is missing from Git generally, so having another feature without explicit ACL support is not a showstopper.

Ok, fair point.

> > what if we attempt to override the permissions by marking directories as writable, but find out the user isn't the owner of the directory and thus can't change its permissions?
>
> Well, then you fail. If the user isn't the owner of the thing being deleted, that's a different ballgame.
>
> > Do we try harder by attempting to invoke chmod under sudo to see if we can override the permissions?
>
> Certainly not. :-)  The user can sudo git if they have permission to do such a thing.
>
> > At what point do we give up? What's the clear line...and why does that line not just get drawn at not changing permissions at all?
>
> I would say that first it should be determined whether the feature is useful. If not, then the line should get drawn at not changing permissions. But if it would be useful, then the line should be initially drawn at wherever gives the most "bang for the buck", the 80/20 case, etc. If it turns out that it's an unreasonable effort even to get a minimally useful implementation, then the stance of the Git team should be "It'd be nice to have, but if you want it you'll have to write it yourself and send us a patch." Which is, I think, how more esoteric features like ACLs should be handled, if they are to be handled at all.
>
> To be specific, I propose that if the unlink or rmdir system call returns EPERM, and you are going to delete the parent directory anyway, then you stat the parent directory, see if it's missing write permissions, and if so, try to chmod u+w and retry the deletion of the child. And if any of those steps fail, you print an error message exactly like today. Alternately, and preferably, you do it even if you won't delete the parent directory, but in that case you have to put the permissions back when you're done. I think both of those are simple to implement, but I don't know about Git's implementation.

This is some good food for thought, especially the bit about "[if] you
are going to delete the parent directory anwyay".  Hmm...


> > Here you've changed the goalposts slightly.  I didn't compare "git clean" to "rm", I compared "git clean -fd" to "rm -rf" on untracked directories.
>
> True, but I only realized that after sending the email.
>
> > In other words, "git clean -fd" avoids bringing the repository back to a pristine state in a certain circumstance; it's actually more cautious than rm.
>
> Which is the main trouble with it. It's very difficult to know exactly what it's going to want to delete, hence the difficulty of fixing the problem myself (where myself is a script invoking "git clean") with any kind of workaround outside Git, and hence my request for it to be handled inside Git.
>
> But I come back to rsync, which I think is perhaps spiritually closer to "git clean" than rm is. It has no problem syncing files even if it has to twiddle some permission bits to do it. And that's convenient.

Yeah, the comparison to rsync is interesting.  I do worry a bit about
what can of worms it opens and whether we really want to do all this
work just to avoid a simple toplevel recursive chmod from the user,
but you do raise some interesting points.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 18:27 Adam Milazzo
  2020-02-20 18:32 ` Junio C Hamano
  2020-02-20 18:46 ` Daniel Knittl-Frank
@ 2020-02-21  1:45 ` brian m. carlson
  2 siblings, 0 replies; 16+ messages in thread
From: brian m. carlson @ 2020-02-21  1:45 UTC (permalink / raw)
  To: Adam Milazzo; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

On 2020-02-20 at 18:27:22, Adam Milazzo wrote:
> Repro steps:
> 1. git init
> 2. mkdir d
> 3. touch d/a
> 4. chmod -w d
> 5. git clean -fd
> 
> Actual result:
> Git doesn't remove anything, saying "warning: failed to remove d/a".
> 
> Expected result:
> Git should remove the subdirectory 'd' along with its contents. Note that git can remove a read-only file (touch b; chmod -w b; git clean -f) with no problem.

I don't believe git should depart from rm in this regard.  I believe
that in general, Unix has stood the test of time, and when in doubt, we
should behave as Unix utilities do.

Note that other utilities (such as Perl) which have implemented
different behavior (usually for compatibility with Windows) have found
themselves with security vulnerabilities in that behavior.  That seems
like a prudent reason not to replicate it.

> Why this is important:
> * This has a significant impact in real scenarios: the go language,
>   when using modules, will download referenced modules and place them
>   in read-only directories inside a cache directory, potentially
>   inside the git repository. These cached modules can't be cleaned up
>   by git clean. (Furthermore, git clean then returns a failure status
>   code, which causes our build system to fail. I don't want to ignore
>   the failure code.)

This sounds like the real problem.  Why is Go placing data in cache
directories that are read-only?  Cache directories are explicitly
ephemeral and should be able to be destroyed at any time.  You'll
probably find more luck convincing the Go maintainers that their caches
should be temporary than you will us that git clean should be your
automatic destroyer of data.  Rust, for example, doesn't do this.

You could also just move your cache directories into $TMPDIR/go-cache
and then do "chmod -R u+w $TMPDIR/go-cache/* && rm -fr $TMPDIR/go-cache/*".
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [EXTERNAL] Re: BUG: git clean -d cannot remove files from read-only directories
  2020-02-20 23:13     ` Elijah Newren
@ 2020-02-21  1:48       ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-02-21  1:48 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Adam Milazzo, git@vger.kernel.org

Elijah Newren <newren@gmail.com> writes:

> should be and what it should do.  Your "put the repository back to a
> pristine state" isn't enough to really go on from an implementation.

Yeah, where does this "pristine state" come from?  Do we need to fix
the "git clean" documentation to reduce such a confusion (I didn't
check to see if the current text is prone to be misunderstood as
such yet).  "git clean" is to remove crufts, isn't it?  The
definition of "cruft" being untracked junk that are marked
expendable (aka "ignored")?  And it is not "remove crurfts no matter
what".

> angle from my previous response in what you've addressed here.  Let's
> say we did implement a new flag that said we should override standard
> permissions.  In such a case, shouldn't ACLs also be overridden?  ...
> what point do we give up?  What's the clear line...and why does that
> line not just get drawn at not changing permissions at all?

Yup.  Elsewhere in the thread you gave "rm -fr" not removing an
unwritable subdirectory, and that shows where others before us drew
the line that has been accepted by the end users.

> I'm sympathetic to "golang does a dumb thing I don't like" (the exact
> situation you hilight has bothered me before too), but it's not clear
> at all to me where or how git might help you cope with that.  To me,
> golang did a dumb and you have to manually undo the dumb.  If you want
> something smarter on the git side instead of the golang side, all the
> stuff I mentioned above are things that would need to be addressed in
> any possible solution that I'm just not seeing.

Amen.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-02-21  1:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 18:55 BUG: git clean -d cannot remove files from read-only directories Adam Milazzo
2020-02-20 19:45 ` Elijah Newren
2020-02-20 20:05   ` [EXTERNAL] " Adam Milazzo
2020-02-20 22:26     ` Junio C Hamano
2020-02-20 23:52       ` Adam Milazzo
2020-02-21  0:42         ` Elijah Newren
2020-02-20 23:13     ` Elijah Newren
2020-02-21  1:48       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2020-02-21  0:52 Adam Milazzo
2020-02-21  1:08 ` Elijah Newren
2020-02-20 19:29 Adam Milazzo
2020-02-20 18:27 Adam Milazzo
2020-02-20 18:32 ` Junio C Hamano
2020-02-20 18:47   ` Junio C Hamano
2020-02-20 18:46 ` Daniel Knittl-Frank
2020-02-21  1:45 ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).